refactor(staffml): retire prod static-fallback; opt-in dev-only (#1598)

The bundled corpus.json was serving as a prod safety net behind the Cloudflare Worker. Post-cutover the Worker has been the real data source, and the static path was silently degrading rather than helping (corpus.json is a generated artifact whose prose `details` are blank in corpus-summary.json). This change: - Stops emitting corpus.json in the publish-live workflow - Removes the Worker-error fallback in getQuestionFullDetail — errors now propagate to useFullQuestion and the UI shows a "details unavailable" banner instead of silently filling blanks - Drops the localhost auto-trigger in shouldUseStaticDetails — the static path now requires explicit NEXT_PUBLIC_VAULT_FALLBACK=static - Switches taxonomy.ts to corpus-summary.json (was corpus.json) - Rewrites the publish-live smoke tests against corpus-summary.json - Collapses validate-vault.py to sparse-only (per-question deep validation lives in `vault check --strict`) Static-fallback remains as an OPT-IN local-dev affordance: set NEXT_PUBLIC_VAULT_FALLBACK=static and run `vault build --legacy-json` to materialize corpus.json. The Function-constructor dynamic import keeps Turbopack from requiring corpus.json at build time. useFullQuestion hook signature changed from `Question | undefined` to `{ question, status }`. Callers updated: practice and plans pages (both render an amber "details unavailable" banner when status is 'error'). Deleted dead cutover scaffolding: corpus-source.ts (router with no UI consumers), corpus-vault.ts (worker-only mirror, never wired up), useVaultQuestion.ts (unused migration hook), vault-fallback.ts (only consumer was corpus-source.ts). Deleted stale docs: staffml/scripts/DEPRECATED.md, vault-cli/docs/ CUTOVER_QA.md, three vault/docs/RESUME_PLAN_*.md. Verified locally: tsc clean, vitest 37/37, next build produces all 15 static routes. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 17:49:07 -05:00 · 2026-04-28 18:47:03 -04:00
parent 6e4ab3f779
commit c824ac6ed1
18 changed files with 177 additions and 1778 deletions
--- a/.github/workflows/staffml-publish-live.yml
+++ b/.github/workflows/staffml-publish-live.yml
@@ -88,13 +88,6 @@ jobs:
      - name: 🛠️ Install vault-cli
        run: pip install -e interviews/vault-cli/

-      - name: 🔄 Regenerate corpus from YAMLs (vault build --legacy-json)
-        # Ships interviews/staffml/src/data/corpus.json (full) +
-        # corpus-summary.json (bundled by the site) straight from the
-        # committed YAMLs. The site always deploys with current YAML state,
-        # even if the committed JSON artifacts drifted on the last commit.
-        run: vault build --vault-dir interviews/vault --release-id publish-live --legacy-json
-
      - name: 🔍 Type check
        working-directory: interviews/staffml
        run: npx tsc --noEmit
@@ -115,9 +108,9 @@ jobs:
          # required on /ask calls from the AskInterviewer panel.
          NEXT_PUBLIC_INTERVIEWER_ENDPOINT: https://mlsysbook.ai/api/staffml-interviewer
          # Vault cutover (PRs #1433, #1434): site reads live data from the
-          # Cloudflare Worker in production; bundled corpus.json is the
-          # offline fallback. NEXT_PUBLIC_VAULT_FALLBACK is intentionally
-          # unset so vault-fallback.ts defaults to 'vault-api'.
+          # Cloudflare Worker in production. There is no static rollback in
+          # prod — corpus.json is neither emitted nor bundled. If the Worker
+          # is unreachable, the UI surfaces a "details unavailable" banner.
          NEXT_PUBLIC_VAULT_API: https://staffml-vault.mlsysbook-ai-account.workers.dev
          NEXT_PUBLIC_VAULT_RELEASE: "1.0.2"
        run: npm run build
@@ -164,20 +157,25 @@ jobs:
        run: python3 interviews/staffml/scripts/validate-vault.py

      - name: 🧪 Smoke tests
+        # Reads the bundled corpus-summary.json (committed) — the prod build
+        # ships this as the synchronous catalog. Heavy fields (scenario,
+        # details prose) live on the Worker and are not re-validated here;
+        # `vault check --strict` covers per-question YAML validation in the
+        # validate-vault workflow.
        run: |
          python3 -c "
-          import json, sys, os
+          import json

-          with open('interviews/staffml/src/data/corpus.json') as f:
+          with open('interviews/staffml/src/data/corpus-summary.json') as f:
              corpus = json.load(f)
          assert len(corpus) >= 4000, f'Corpus too small: {len(corpus)}'
          print(f'✅ Corpus: {len(corpus)} questions')

-          required = ['id', 'title', 'level', 'track', 'scenario', 'competency_area', 'topic', 'zone', 'details']
+          required = ['id', 'title', 'level', 'track', 'competency_area', 'topic', 'zone']
          for q in corpus:
              for f in required:
                  assert q.get(f), f'{q.get(\"id\", \"???\")} missing {f}'
-          print('✅ All questions have required fields')
+          print('✅ All questions have required structural fields')

          valid_levels = {'L1', 'L2', 'L3', 'L4', 'L5', 'L6', 'L6+'}
          for q in corpus:
--- a/interviews/staffml/.env.example
+++ b/interviews/staffml/.env.example
@@ -17,9 +17,16 @@ NEXT_PUBLIC_VAULT_API=https://staffml-vault.mlsysbook-ai-account.workers.dev
 # mismatch surfaces in X-Vault-Release SLI but still serves.
 NEXT_PUBLIC_VAULT_RELEASE=1.0.2

-# Data-source switch:
-#   unset or 'vault' → worker-primary, bundled corpus.json as fallback (DEFAULT)
-#   'static'         → bundled-only (rollback / offline / worker-unreachable dev)
-# The bundled corpus.json is preserved on disk as a safety net — it is not
-# deleted, but the site reads from the worker when it's reachable.
+# OPT-IN offline dev mode (local-only — production never sets this):
+#   unset (DEFAULT) → site reads details from the Worker. If the Worker is
+#                     unreachable, detail prose is omitted and the UI shows
+#                     a "details unavailable" banner.
+#   'static'        → site reads details from a bundled corpus.json instead
+#                     of the Worker. Requires materializing corpus.json
+#                     locally first:
+#                       vault build --vault-dir interviews/vault \
+#                                   --release-id local-dev --legacy-json
+#                     Use this when working offline or against an unreachable
+#                     Worker. Production deploys neither emit nor bundle
+#                     corpus.json — there is no static rollback path in prod.
 # NEXT_PUBLIC_VAULT_FALLBACK=static
--- a/interviews/staffml/scripts/DEPRECATED.md
+++ b/interviews/staffml/scripts/DEPRECATED.md
@@ -1,36 +0,0 @@
-# Deprecated scripts — `interviews/staffml/scripts/`
-
-These pre-date the YAML migration (ARCHITECTURE.md v2.x, Phase 1). They ran
-against the monolithic `interviews/vault/corpus.json` (now a generated
-artifact) or pushed data into `src/data/corpus.json` (now emitted by
-`vault build --legacy-json`).
-
-## Replaced-by map
-
-| Legacy script | Purpose | Replacement |
-|---|---|---|
-| `sync-vault.py` | Copied vault/corpus.json → src/data/ with filter | `vault build --legacy-json` emits site-compatible JSON directly |
-| `generate-manifest.py` | Built src/data/vault-manifest.json | Built by `vault publish` as a release artifact |
-| `validate-vault.py` | Sanity check on corpus shape | Covered by `vault check --strict` invariants |
-| `format-napkin-math.py` | One-shot formatter | Obsolete |
-| `sync-periodic-table.mjs` | Unrelated (periodic-table site feature) | Still active — NOT deprecated |
-
-## Current flow
-
-```bash
-vault build --legacy-json                     # from repo root
-# Regenerates:
-#   interviews/staffml/src/data/corpus.json   (9199 questions, site-compatible shape)
-#   interviews/vault/vault.db                 (25 MB SQLite build artifact)
-# Verifies release_hash against corpus-equivalence-hash.txt
-```
-
-The site layout has NOT changed: `corpus.ts` still does
-`import corpusData from '../data/corpus.json'`. The only difference is that
-`corpus.json` is now derived from YAML rather than hand-edited — a
-pre-commit hook refuses direct edits to it.
-
-Phase-4 cutover replaces the bundled JSON with Worker-API reads via
-`corpus-source.ts` + `vault-api.ts`. That's a separate step;
-`corpus.json` stays through at least 2 post-cutover releases as the
-rollback fallback.
--- a/interviews/staffml/scripts/validate-vault.py
+++ b/interviews/staffml/scripts/validate-vault.py
@@ -1,15 +1,16 @@
 #!/usr/bin/env python3
-"""Validate vault data integrity for StaffML deployment.
+"""Sparse vault sanity check for the StaffML deploy.

-When ``corpus.json`` is present (e.g. after ``vault build --legacy-json``), runs
-full cross-checks against taxonomy and manifest.
+Validates the small committed metadata files that ship in the repo:
+``taxonomy.json`` and ``vault-manifest.json``. Confirms taxonomy has
+concepts, manifest has a question count, and track distributions add up.

-When ``corpus.json`` is absent — the normal case for a clean clone after
-2026-04-26, when corpus was retired as a tracked file — runs **sparse** checks
-only: committed ``taxonomy.json`` and ``vault-manifest.json`` must load and
-look self-consistent. Full per-question validation is expected from
-``vault check --strict`` in CI (``staffml-validate-vault.yml``) and from this
-script after a local or CI ``vault build -- ... --legacy-json``.
+Per-question deep validation (schema, chain integrity, math, etc.) is
+covered by ``vault check --strict`` (run in CI via
+``staffml-validate-vault.yml``), which validates directly against the
+YAML source files in ``interviews/vault/`` rather than a generated JSON
+artifact. This script is the cheap pre-deploy gate; ``vault check`` is
+the authoritative one.

 Exit code 0 = all checks pass, 1 = errors found.

@@ -18,7 +19,6 @@ Usage: python3 interviews/staffml/scripts/validate-vault.py

 import json
 import sys
-from collections import Counter
 from pathlib import Path

 STAFFML_DATA = Path(__file__).parent.parent / "src" / "data"
@@ -41,12 +41,14 @@ def ok(msg: str) -> None:
    print(f"  ✅ {msg}")


-def run_sparse_validation(taxonomy_path: Path, manifest_path: Path) -> int:
-    """Validate committed JSON when the full bundled corpus is not on disk."""
-    print("\n🔍 Sparse mode (no corpus.json)")
+def main() -> int:
+    taxonomy_path = STAFFML_DATA / "taxonomy.json"
+    manifest_path = STAFFML_DATA / "vault-manifest.json"
+
+    print("\n🔍 Sparse vault check (committed metadata only)")
    print(
-        "   Per-question checks require a build artifact. Regenerate with:\n"
-        "   vault build --vault-dir interviews/vault --release-id <id> --legacy-json\n"
+        "   Per-question deep validation lives in `vault check --strict` "
+        "(staffml-validate-vault.yml).\n"
    )

    if not taxonomy_path.exists():
@@ -88,7 +90,6 @@ def run_sparse_validation(taxonomy_path: Path, manifest_path: Path) -> int:
    ok(f"Vault v{ver} — hash {h}")

    print(f"\n{'=' * 50}")
-    print(f"  Mode:     sparse (no corpus.json)")
    print(f"  Errors:   {len(errors)}")
    print(f"  Warnings: {len(warnings)}")
    print(f"{'=' * 50}")
@@ -97,213 +98,13 @@ def run_sparse_validation(taxonomy_path: Path, manifest_path: Path) -> int:
        print("\n❌ Sparse validation failed")
        return 1
    print(
-        "\n🎯 Sparse checks passed — for full deploy-grade validation, run vault build "
-        "--legacy-json and re-run this script, or rely on staffml-validate-vault (CI)."
+        "\n🎯 Sparse checks passed — for deep per-question validation run "
+        "`vault check --strict` (or rely on staffml-validate-vault in CI)."
    )
    if warnings:
        print(f"   ({len(warnings)} warnings — review recommended)")
    return 0


-# ── 1. Load data ─────────────────────────────────────────────
-
-corpus_path = STAFFML_DATA / "corpus.json"
-taxonomy_path = STAFFML_DATA / "taxonomy.json"
-manifest_path = STAFFML_DATA / "vault-manifest.json"
-
-if not corpus_path.exists():
-    sys.exit(run_sparse_validation(taxonomy_path, manifest_path))
-
-if not taxonomy_path.exists():
-    print(f"  ❌ taxonomy.json not found at {taxonomy_path}", file=sys.stderr)
-    sys.exit(1)
-
-print("\n🔍 Loading data files...")
-
-with open(corpus_path, encoding="utf-8") as f:
-    corpus = json.load(f)
-with open(taxonomy_path, encoding="utf-8") as f:
-    taxonomy = json.load(f)
-
-manifest = None
-if manifest_path.exists():
-    with open(manifest_path, encoding="utf-8") as f:
-        manifest = json.load(f)
-
-ok(f"Loaded {len(corpus)} questions, {len(taxonomy.get('concepts', []))} concepts")
-
-# ── 2. Schema checks ─────────────────────────────────────────
-
-print("\n📋 Schema validation...")
-
-REQUIRED_FIELDS = [
-    "id",
-    "title",
-    "level",
-    "track",
-    "scenario",
-    "competency_area",
-    "details",
-]
-VALID_LEVELS = {"L1", "L2", "L3", "L4", "L5", "L6", "L6+"}
-VALID_TRACKS = {"cloud", "edge", "mobile", "tinyml", "global"}
-DETAIL_FIELDS = ["common_mistake", "realistic_solution"]
-
-missing_fields = 0
-bad_levels = 0
-bad_tracks = 0
-short_scenarios = 0
-empty_answers = 0
-
-for q in corpus:
-    qid = q.get("id", "???")
-
-    for field in REQUIRED_FIELDS:
-        if not q.get(field):
-            error(f"{qid}: missing required field '{field}'")
-            missing_fields += 1
-
-    if q.get("level") not in VALID_LEVELS:
-        error(f"{qid}: invalid level '{q.get('level')}'")
-        bad_levels += 1
-
-    if q.get("track") not in VALID_TRACKS:
-        error(f"{qid}: invalid track '{q.get('track')}'")
-        bad_tracks += 1
-
-    scenario = q.get("scenario", "")
-    if len(scenario.strip()) < 30:
-        warn(f"{qid}: scenario too short ({len(scenario)} chars)")
-        short_scenarios += 1
-
-    details = q.get("details", {})
-    for df in DETAIL_FIELDS:
-        if not details.get(df) or len(str(details.get(df, "")).strip()) < 5:
-            warn(f"{qid}: details.{df} empty or too short")
-            empty_answers += 1
-
-if missing_fields == 0 and bad_levels == 0 and bad_tracks == 0:
-    ok("All questions have valid required fields, levels, and tracks")
-else:
-    error(
-        f"{missing_fields} missing fields, {bad_levels} bad levels, {bad_tracks} bad tracks"
-    )
-
-# ── 3. Uniqueness checks ─────────────────────────────────────
-
-print("\n🔑 Uniqueness checks...")
-
-ids = [q["id"] for q in corpus]
-id_counts = Counter(ids)
-dupes = {k: v for k, v in id_counts.items() if v > 1}
-if dupes:
-    error(f"{len(dupes)} duplicate IDs: {list(dupes.keys())[:5]}")
-else:
-    ok(f"All {len(ids)} question IDs are unique")
-
-# ── 4. Taxonomy consistency ──────────────────────────────────
-
-print("\n🏷️  Taxonomy consistency...")
-
-concepts = {c["id"] for c in taxonomy.get("concepts", [])}
-corpus_concepts = {q.get("taxonomy_concept") for q in corpus if q.get("taxonomy_concept")}
-unmapped = corpus_concepts - concepts
-
-if unmapped:
-    warn(f"{len(unmapped)} corpus concepts not in taxonomy: {list(unmapped)[:5]}")
-else:
-    ok(f"All {len(corpus_concepts)} corpus concepts exist in taxonomy")
-
-corpus_areas = Counter(q.get("competency_area", "???") for q in corpus)
-ok(f"{len(corpus_areas)} competency areas in use")
-
-# ── 5. Chain integrity ───────────────────────────────────────
-
-print("\n🔗 Chain integrity...")
-
-chains: dict[str, list] = {}
-for q in corpus:
-    cids = q.get("chain_ids", "")
-    if isinstance(cids, list):
-        for cid in cids:
-            if cid:
-                chains.setdefault(cid, []).append(q)
-    elif cids:
-        chains.setdefault(cids, []).append(q)
-
-solo_chains = sum(1 for c in chains.values() if len(c) <= 1)
-if solo_chains > 0:
-    warn(f"{solo_chains} single-question chains (should be 2+)")
-
-duplicate_chains = 0
-for cid, qs in chains.items():
-    pos_list = []
-    for q in qs:
-        cp = q.get("chain_positions", -1)
-        if isinstance(cp, dict):
-            pos_list.append(int(cp.get(cid, -1)))
-        else:
-            pos_list.append(int(cp) if cp != "" else -1)
-    if len(pos_list) != len(set(pos_list)):
-        duplicate_chains += 1
-        if duplicate_chains <= 3:
-            warn(f"Chain '{cid}': duplicate positions {sorted(pos_list)}")
-
-if duplicate_chains == 0:
-    ok(f"All {len(chains)} chains have unique positions")
-else:
-    warn(f"{duplicate_chains} chains have duplicate positions")
-
-# ── 6. Manifest consistency ──────────────────────────────────
-
-print("\n📦 Manifest consistency...")
-
-if manifest:
-    if manifest.get("questionCount") != len(corpus):
-        error(
-            f"Manifest says {manifest['questionCount']} questions, corpus has {len(corpus)}"
-        )
-    else:
-        ok(f"Manifest matches corpus: {len(corpus)} questions")
-
-    if manifest.get("chainCount") != len(chains):
-        warn(
-            f"Manifest says {manifest['chainCount']} chains, found {len(chains)}"
-        )
-
-    ok(f"Vault v{manifest.get('version', '?')} — hash {manifest.get('contentHash', '?')}")
-else:
-    warn("No vault-manifest.json found — run vault build --legacy-json")
-
-# ── 7. Distribution sanity ───────────────────────────────────
-
-print("\n📊 Distribution sanity...")
-
-level_dist = Counter(q.get("level") for q in corpus)
-track_dist = Counter(q.get("track") for q in corpus)
-
-for track, count in track_dist.items():
-    pct = count / len(corpus) * 100
-    if pct < 2:
-        warn(f"Track '{track}' has only {count} questions ({pct:.1f}%)")
-
-ok(f"Levels: {dict(sorted(level_dist.items()))}")
-ok(f"Tracks: {dict(sorted(track_dist.items()))}")
-
-# ── Summary ──────────────────────────────────────────────────
-
-print(f"\n{'=' * 50}")
-print(f"  Questions: {len(corpus)}")
-print(f"  Chains:    {len(chains)}")
-print(f"  Concepts:  {len(concepts)}")
-print(f"  Errors:    {len(errors)}")
-print(f"  Warnings:  {len(warnings)}")
-print(f"{'=' * 50}")
-
-if errors:
-    print(f"\n❌ {len(errors)} errors found — vault is NOT deployment-ready")
-    sys.exit(1)
-print("\n🎯 All checks passed — vault is deployment-ready")
-if warnings:
-    print(f"   ({len(warnings)} warnings — review recommended)")
-sys.exit(0)
+if __name__ == "__main__":
+    sys.exit(main())
--- a/interviews/staffml/src/app/plans/page.tsx
+++ b/interviews/staffml/src/app/plans/page.tsx
@@ -54,7 +54,8 @@ export default function PlansPage() {
  };

  const currentSummary = questions[currentIdx];
-  const current = useFullQuestion(currentSummary) ?? currentSummary;
+  const { question: hydrated, status: hydrationStatus } = useFullQuestion(currentSummary);
+  const current = hydrated ?? currentSummary;
  const maxScore = napkinResult?.maxSelfScore ?? 3;

  const handleReveal = () => {
@@ -217,6 +218,11 @@ export default function PlansPage() {
                <span className="text-[10px] font-mono text-textTertiary">{current.track} / {current.level}</span>
              </div>
              <h2 className="text-2xl lg:text-3xl font-bold text-textPrimary mb-6 tracking-tight">{current.title}</h2>
+              {hydrationStatus === "error" && (
+                <div className="mb-4 rounded-md border border-amber-300 bg-amber-50 px-3 py-2 text-sm text-amber-800">
+                  Could not load the full question details. Reload to retry.
+                </div>
+              )}
              <div className="prose max-w-none">
                {current.scenario ? (
                  <p className="text-textSecondary leading-relaxed text-base">{cleanScenario(current.scenario)}</p>
--- a/interviews/staffml/src/app/practice/page.tsx
+++ b/interviews/staffml/src/app/practice/page.tsx
@@ -183,7 +183,8 @@ function PracticePage() {
  // (no scenario/details). `current` is hydrated from the worker via
  // useFullQuestion — same shape, but scenario + details populated.
  const [currentSummary, setCurrentSummary] = useState<Question | null>(null);
-  const current = useFullQuestion(currentSummary) ?? currentSummary;
+  const { question: hydrated, status: hydrationStatus } = useFullQuestion(currentSummary);
+  const current = hydrated ?? currentSummary;
  const setCurrent = setCurrentSummary;
  const skipFilterCount = useRef(0);
  const questionShownAt = useRef(Date.now());
@@ -1056,6 +1057,14 @@ function PracticePage() {
                        {current.title}
                      </h2>

+                      {hydrationStatus === "error" && (
+                        <div className="mb-4 rounded-md border border-amber-300 bg-amber-50 px-3 py-2 text-sm text-amber-800">
+                          Could not load the full question details. The
+                          question prompt is shown, but scenario and answer
+                          notes are unavailable. Reload to retry.
+                        </div>
+                      )}
+
                      {/*
                        STICKY Your-task callout. Pins to the top of the
                        scroll container so the question stays visible
--- a/interviews/staffml/src/lib/corpus-provider.tsx
+++ b/interviews/staffml/src/lib/corpus-provider.tsx
@@ -1,20 +1,23 @@
 "use client";

 /**
- * CorpusProvider — Phase-4 hybrid data layer.
+ * CorpusProvider — hybrid data layer.
 *
- * The bundled corpus.json remains the primary data source for synchronous
- * operations (getQuestions, getQuestionsByFilter, etc.). The Worker API
- * enhances two specific operations:
+ * The bundled `corpus-summary.json` is the primary data source for
+ * synchronous operations (getQuestions, getQuestionsByFilter, taxonomy,
+ * navigation). Heavy fields (scenario, details prose) come from the
+ * Cloudflare Worker via vault-api.ts.
+ *
+ * The Worker enhances two specific operations:
 *
 * 1. **Search** — FTS5 full-text search via /search endpoint replaces the
 *    client-side O(n) string matching.
 * 2. **Service worker registration** — enables offline caching of API
- *    responses for future full-API cutover.
+ *    responses for the per-question detail fetches.
 *
- * When NEXT_PUBLIC_VAULT_API is set and NEXT_PUBLIC_VAULT_FALLBACK is not
- * "static", the provider registers the service worker and exposes the
- * vault-enhanced search. Otherwise everything falls back silently.
+ * NEXT_PUBLIC_VAULT_FALLBACK=static is an OPT-IN local-dev affordance for
+ * working without a reachable Worker (requires `vault build --legacy-json`
+ * to materialize corpus.json). Production never sets it.
 */

 import { createContext, useContext, useEffect, useState, useCallback, type ReactNode } from "react";
--- a/interviews/staffml/src/lib/corpus-source.ts
+++ b/interviews/staffml/src/lib/corpus-source.ts
@@ -1,56 +0,0 @@
-/**
- * Corpus data-source switch (Phase-4 cutover router).
- *
- * Components that want to be cutover-aware import from this module instead of
- * ``corpus.ts``. Returns the vault-API-backed path when
- * ``NEXT_PUBLIC_VAULT_FALLBACK`` is NOT 'static', falls back to the bundled
- * path otherwise.
- *
- * Components untouched by the cutover continue importing ``corpus.ts`` directly
- * (unchanged behavior) until the user is ready to flip them. This keeps the
- * Phase-4 cutover reviewable one component at a time.
- */
-
-import { usingFallback } from "./vault-fallback";
-import * as legacy from "./corpus";
-import * as vault from "./corpus-vault";
-
-export function getCorpusSource(): "static" | "vault-api" {
-  return usingFallback() ? "static" : "vault-api";
-}
-
-export async function getQuestionById(id: string): Promise<unknown | null> {
-  if (usingFallback()) {
-    const qs = legacy.getQuestions();
-    return qs.find(q => q.id === id) ?? null;
-  }
-  return vault.getQuestionById(id);
-}
-
-export async function listQuestions(
-  params: { track?: string; level?: string; zone?: string; limit?: number } = {},
-): Promise<unknown[]> {
-  if (usingFallback()) {
-    let qs = legacy.getQuestions() as any[];
-    if (params.track) qs = qs.filter(q => q.track === params.track);
-    if (params.level) qs = qs.filter(q => q.level === params.level);
-    if (params.zone) qs = qs.filter(q => q.zone === params.zone);
-    if (params.limit) qs = qs.slice(0, params.limit);
-    return qs;
-  }
-  return vault.listQuestions(params);
-}
-
-export async function searchQuestions(q: string, limit = 20): Promise<unknown[]> {
-  if (usingFallback()) {
-    const qs = legacy.getQuestions() as any[];
-    const needle = q.toLowerCase();
-    return qs
-      .filter(item =>
-        (item.title ?? "").toLowerCase().includes(needle)
-        || (item.scenario ?? "").toLowerCase().includes(needle)
-      )
-      .slice(0, limit);
-  }
-  return vault.searchQuestions(q, limit);
-}
--- a/interviews/staffml/src/lib/corpus-vault.ts
+++ b/interviews/staffml/src/lib/corpus-vault.ts
@@ -1,161 +0,0 @@
-/**
- * Vault-API-backed corpus data source.
- *
- * Mirror of the public surface of ``corpus.ts`` but sourced from the
- * staffml-vault Worker via ``vault-api.ts`` instead of the bundled
- * ``corpus.json``. Not wired into any component until cutover — the
- * switch happens via ``corpus-source.ts``.
- *
- * Post-v1.0 (2026-04-21): the vault schema now carries track/level/zone
- * as YAML fields and uses plural `chains: [{id, position}]`, so this
- * adapter's job shrinks considerably. The defaulting to
- * `track='global'`/`level='l1'`/`zone='recall'` that existed here was
- * exactly the silent-mis-classification pattern that hid the v0.1
- * migration bug; those defaults are gone.
- */
-
-import type { Question as VaultQuestion } from "@staffml/vault-types";
-import { makeClientFromEnv, VaultApiClient } from "./vault-api";
-
-// v1.0: classification lives on the Question itself.
-type EnrichedVaultQuestion = VaultQuestion & {
-  track: string;
-  level: string;
-  zone: string;
-  competency_area: string;
-  bloom_level?: string;
-  phase?: string;
-  question?: string;
-  visual?: {
-    kind: "svg";              // closed enum as of v0.1.2 (mermaid retired)
-    path: string;
-    alt: string;              // ≥10 chars (a11y)
-    caption: string;          // required as of v0.1.2, ≥5 chars
-  };
-  chains?: Array<{ id: string; position: number }>;
-  validated?: boolean;
-  math_verified?: boolean;
-  human_reviewed?: {
-    status: string;
-    by?: string | null;
-    date?: string | null;
-  };
-};
-
-// Shape the UI already expects (see corpus.ts).
-export interface Question {
-  id: string;
-  track: string;
-  level: string;
-  title: string;
-  topic: string;
-  zone: string;
-  competency_area: string;
-  bloom_level?: string;
-  phase?: string;
-  scenario: string;
-  question?: string;
-  visual?: {
-    kind: "svg";              // closed enum as of v0.1.2 (mermaid retired)
-    path: string;
-    alt: string;              // ≥10 chars (a11y)
-    caption: string;          // required as of v0.1.2, ≥5 chars
-  };
-  chain_ids?: string[];
-  chain_positions?: Record<string, number>;
-  details: {
-    common_mistake: string;
-    realistic_solution: string;
-    napkin_math?: string;
-  };
-  validated?: boolean;
-  math_verified?: boolean;
-  human_reviewed?: {
-    status: string;
-    by?: string | null;
-    date?: string | null;
-  };
-}
-
-function adapt(v: EnrichedVaultQuestion): Question {
-  // Rebuild legacy chain_ids + chain_positions from the plural `chains` list.
-  const chainIds: string[] = [];
-  const chainPositions: Record<string, number> = {};
-  for (const c of v.chains ?? []) {
-    chainIds.push(c.id);
-    chainPositions[c.id] = c.position;
-  }
-  return {
-    id: v.id,
-    track: v.track,
-    level: v.level,
-    title: v.title,
-    topic: v.topic,
-    zone: v.zone,
-    competency_area: v.competency_area,
-    bloom_level: v.bloom_level,
-    phase: v.phase,
-    scenario: v.scenario,
-    question: v.question,
-    visual: v.visual,
-    chain_ids: chainIds.length ? chainIds : undefined,
-    chain_positions: chainIds.length ? chainPositions : undefined,
-    details: {
-      common_mistake: v.details.common_mistake ?? "",
-      realistic_solution: v.details.realistic_solution,
-      napkin_math: v.details.napkin_math,
-    },
-    validated: v.validated,
-    math_verified: v.math_verified,
-    human_reviewed: v.human_reviewed,
-  };
-}
-
-let _client: VaultApiClient | null | undefined = undefined;
-function client(): VaultApiClient {
-  if (_client === undefined) _client = makeClientFromEnv();
-  if (_client === null) {
-    throw new Error(
-      "NEXT_PUBLIC_VAULT_API is not set. Point it at the worker or set "
-      + "NEXT_PUBLIC_VAULT_FALLBACK=static to use the bundled corpus.",
-    );
-  }
-  return _client;
-}
-
-// In-memory cache; SWR (in real consumption via hooks) layers on top.
-const _byId = new Map<string, Question>();
-
-export async function getQuestionById(id: string): Promise<Question | null> {
-  if (_byId.has(id)) return _byId.get(id)!;
-  try {
-    const v = await client().getQuestion(id);
-    const q = adapt(v as EnrichedVaultQuestion);
-    _byId.set(id, q);
-    return q;
-  } catch {
-    return null;
-  }
-}
-
-export async function listQuestions(params: {
-  track?: string; level?: string; zone?: string; limit?: number;
-} = {}): Promise<Question[]> {
-  const res = await client().listQuestions(params);
-  return (res.items as EnrichedVaultQuestion[]).map(adapt);
-}
-
-export async function searchQuestions(q: string, limit = 20): Promise<Question[]> {
-  const res = await client().search(q, limit);
-  return (res.results as EnrichedVaultQuestion[]).map(adapt);
-}
-
-/**
- * Synchronous getQuestions() — compatibility shim for legacy call sites that
- * expect an array rather than a Promise. Returns the currently-cached set
- * (populated by prior async calls). Callers doing full-corpus scans must
- * migrate to listQuestions().
- */
-export function getQuestions(): Question[] {
-  return Array.from(_byId.values());
-}
--- a/interviews/staffml/src/lib/corpus.ts
+++ b/interviews/staffml/src/lib/corpus.ts
@@ -429,17 +429,27 @@ const VAULT_API = process.env.NEXT_PUBLIC_VAULT_API
 const _detailsCache = new Map<string, Question>();
 let _staticDetailsCache: Map<string, Question> | null = null;

+// Opt-in offline / local-dev mode. Set NEXT_PUBLIC_VAULT_FALLBACK=static and
+// run `vault build --legacy-json` to materialize corpus.json on disk. Not a
+// prod safety net: production deploys neither emit nor bundle corpus.json.
 function shouldUseStaticDetails(): boolean {
-  if (process.env.NEXT_PUBLIC_VAULT_FALLBACK?.toLowerCase() === "static") return true;
-  if (typeof window === "undefined") return false;
-  return window.location.hostname === "localhost" || window.location.hostname === "127.0.0.1";
+  return process.env.NEXT_PUBLIC_VAULT_FALLBACK?.toLowerCase() === "static";
 }

 async function getStaticFullDetail(id: string, summary: Question): Promise<Question | undefined> {
  if (!_staticDetailsCache) {
-    const mod = await import("../data/corpus.json");
-    const fullQuestions = mod.default as unknown as Question[];
-    _staticDetailsCache = new Map(fullQuestions.map((q) => [q.id, q]));
+    // Function-constructor dynamic import: hides the path from Turbopack's
+    // static analyzer so prod builds don't require corpus.json to exist.
+    // corpus.json is materialized on disk only when a contributor runs
+    // `vault build --legacy-json` locally with NEXT_PUBLIC_VAULT_FALLBACK=
+    // static. If the file is missing at runtime, the import rejects and
+    // the caller surfaces an error to the UI.
+    const dynImport = new Function(
+      "p",
+      "return import(p)",
+    ) as (p: string) => Promise<{ default: Question[] }>;
+    const mod = await dynImport("../data/corpus.json");
+    _staticDetailsCache = new Map(mod.default.map((q) => [q.id, q]));
  }
  const full = _staticDetailsCache.get(id);
  if (!full) return undefined;
@@ -457,8 +467,10 @@ async function getStaticFullDetail(id: string, summary: Question): Promise<Quest

 /**
 * Fetch the FULL question (with `scenario` and `details.*`) from the
- * Cloudflare Worker. Returns the summary-only record on network failure
- * so the UI can still render id/title/level/zone.
+ * Cloudflare Worker. Returns the cache-merged Question on success.
+ * Throws on Worker error — useFullQuestion catches and renders the
+ * "details unavailable" state. (Static fallback is opt-in via
+ * NEXT_PUBLIC_VAULT_FALLBACK=static and is handled earlier.)
 */
 export async function getQuestionFullDetail(id: string): Promise<Question | undefined> {
  const cached = _detailsCache.get(id);
@@ -471,51 +483,44 @@ export async function getQuestionFullDetail(id: string): Promise<Question | unde
    return getStaticFullDetail(id, summary);
  }

-  try {
-    const res = await fetch(`${VAULT_API}/questions/${encodeURIComponent(id)}`, {
-      signal: AbortSignal.timeout(5_000),
-    });
-    if (!res.ok) return (await getStaticFullDetail(id, summary)) ?? summary;
-    // Worker returns a DENORMALIZED row (flat fields straight from the D1
-    // questions table) — common_mistake / realistic_solution / napkin_math
-    // live at the top level, NOT under `details`. Re-nest to match the
-    // site's Question shape before returning, otherwise callers get
-    // `current.details.napkin_math` → TypeError on an undefined details.
-    const full = await res.json() as {
-      scenario?: string;
-      common_mistake?: string;
-      realistic_solution?: string;
-      napkin_math?: string;
-      details?: Question["details"];   // future-proof if worker changes
-    };
-    const workerDetails = full.details ?? {
-      common_mistake: full.common_mistake ?? "",
-      realistic_solution: full.realistic_solution ?? "",
-      napkin_math: full.napkin_math ?? "",
-    };
-    const merged: Question = {
-      ...summary,
-      scenario: full.scenario ?? summary.scenario,
-      details: {
-        // Preserve MCQ options/correct_index that came in the summary.
-        ...summary.details,
-        ...workerDetails,
-      },
-    };
-    _detailsCache.set(id, merged);
-    return merged;
-  } catch {
-    // Worker unreachable → serve the bundled full corpus when available.
-    // This keeps local previews usable even when the Worker blocks localhost
-    // via CORS, and gives production a graceful fallback on transient outages.
-    return (await getStaticFullDetail(id, summary)) ?? summary;
-  }
+  const res = await fetch(`${VAULT_API}/questions/${encodeURIComponent(id)}`, {
+    signal: AbortSignal.timeout(5_000),
+  });
+  if (!res.ok) throw new Error(`worker ${res.status}`);
+  // Worker returns a DENORMALIZED row (flat fields straight from the D1
+  // questions table) — common_mistake / realistic_solution / napkin_math
+  // live at the top level, NOT under `details`. Re-nest to match the
+  // site's Question shape before returning, otherwise callers get
+  // `current.details.napkin_math` → TypeError on an undefined details.
+  const full = await res.json() as {
+    scenario?: string;
+    common_mistake?: string;
+    realistic_solution?: string;
+    napkin_math?: string;
+    details?: Question["details"];   // future-proof if worker changes
+  };
+  const workerDetails = full.details ?? {
+    common_mistake: full.common_mistake ?? "",
+    realistic_solution: full.realistic_solution ?? "",
+    napkin_math: full.napkin_math ?? "",
+  };
+  const merged: Question = {
+    ...summary,
+    scenario: full.scenario ?? summary.scenario,
+    details: {
+      // Preserve MCQ options/correct_index that came in the summary.
+      ...summary.details,
+      ...workerDetails,
+    },
+  };
+  _detailsCache.set(id, merged);
+  return merged;
 }

 /**
 * Pre-warm the details cache for a batch of IDs (e.g., gauntlet session).
- * Fires fetches in parallel, resolves when all complete (or time out).
+ * Fires fetches in parallel; individual failures don't reject the batch.
 */
 export async function prefetchQuestionDetails(ids: string[]): Promise<void> {
-  await Promise.all(ids.map(id => getQuestionFullDetail(id)));
+  await Promise.allSettled(ids.map(id => getQuestionFullDetail(id)));
 }
--- a/interviews/staffml/src/lib/hooks/useFullQuestion.ts
+++ b/interviews/staffml/src/lib/hooks/useFullQuestion.ts
@@ -3,13 +3,19 @@
 *
 * The bundled corpus is summary-only (id/title/level/zone/topic/… — no
 * scenario/details). When a component needs the heavy fields, wrap the
- * summary with this hook. It fetches from the worker and re-renders.
+ * summary with this hook. It fetches from the Worker and re-renders.
+ *
+ * Returns { question, status }:
+ *   - question: the best record we have (summary on first render, or after
+ *     a failed fetch; full record once the Worker resolves)
+ *   - status: 'loading' while the fetch is in flight, 'ready' on success,
+ *     'error' if the Worker is unreachable. Callers can render an error
+ *     hint ("Details unavailable — retry") when status === 'error'.
 *
 * Usage:
- *   const current = getQuestionById(qId);           // sync, summary only
- *   const full = useFullQuestion(current);          // async hydrate
- *   // First render: full === current (scenario/details undefined)
- *   // After fetch:  full === { ...current, scenario, details }
+ *   const summary = getQuestionById(qId);
+ *   const { question, status } = useFullQuestion(summary);
+ *   if (status === 'error') return <DetailsUnavailable onRetry={…} />;
 */

 "use client";
@@ -17,39 +23,55 @@
 import { useEffect, useState } from "react";
 import { getQuestionFullDetail, type Question } from "../corpus";

-export function useFullQuestion(summary: Question | undefined | null): Question | undefined {
-  const [hydrated, setHydrated] = useState<Question | undefined>(
-    summary ?? undefined,
-  );
+export type UseFullQuestionStatus = "loading" | "ready" | "error";
+
+export interface UseFullQuestionResult {
+  question: Question | undefined;
+  status: UseFullQuestionStatus;
+}
+
+export function useFullQuestion(
+  summary: Question | undefined | null,
+): UseFullQuestionResult {
+  const [result, setResult] = useState<UseFullQuestionResult>(() => ({
+    question: summary ?? undefined,
+    status: summary ? "loading" : "ready",
+  }));

  useEffect(() => {
    if (!summary) {
-      setHydrated(undefined);
+      setResult({ question: undefined, status: "ready" });
      return;
    }
-    // If we already have scenario cached in the summary, skip fetch.
+    // Already hydrated in the summary itself (rare, but possible if a
+    // future bundle ships details inline). Skip the fetch.
    if (summary.scenario && summary.details?.realistic_solution) {
-      setHydrated(summary);
+      setResult({ question: summary, status: "ready" });
      return;
    }
-    // Seed with summary so listing UI renders instantly; then hydrate.
-    setHydrated(summary);
+    setResult({ question: summary, status: "loading" });
    let cancelled = false;
-    getQuestionFullDetail(summary.id).then(full => {
-      if (cancelled || !full) return;
-      // Merge rather than replace: the worker returns the heavy fields
-      // (scenario, details) but does not necessarily carry every
-      // summary-bundle field. Summary fields like `question` (the
-      // explicit-ask prompt) live in the bundle and would otherwise be
-      // dropped by a straight replace. Spread summary first so worker
-      // values win where they overlap (they carry the real content),
-      // but summary-only fields survive.
-      setHydrated({ ...summary, ...full });
-    });
+    getQuestionFullDetail(summary.id)
+      .then(full => {
+        if (cancelled) return;
+        if (!full) {
+          setResult({ question: summary, status: "error" });
+          return;
+        }
+        // Merge rather than replace: the Worker returns the heavy fields
+        // (scenario, details) but does not necessarily carry every
+        // summary-bundle field. Spread summary first so Worker values
+        // win where they overlap, but summary-only fields survive.
+        setResult({ question: { ...summary, ...full }, status: "ready" });
+      })
+      .catch(() => {
+        if (cancelled) return;
+        setResult({ question: summary, status: "error" });
+      });
    return () => {
      cancelled = true;
    };
-  }, [summary?.id]);   // re-run when the summary ID changes
+  }, [summary?.id]);

-  return hydrated;
+  return result;
 }
--- a/interviews/staffml/src/lib/hooks/useVaultQuestion.ts
+++ b/interviews/staffml/src/lib/hooks/useVaultQuestion.ts
@@ -1,49 +0,0 @@
-/**
- * React hook — single-question fetch through the Phase-4 cutover router.
- *
- * Components that opt into the cutover import `useVaultQuestion` instead of
- * calling `corpus.getQuestions()` synchronously. On `NEXT_PUBLIC_VAULT_FALLBACK=static`
- * it returns the question from the bundled corpus (synchronous resolve);
- * otherwise it fetches via the Worker API through `corpus-source.ts`.
- *
- * Part of B.17 — the migration path for existing components is one-at-a-time
- * swap from `corpus.getQuestionById()` to `useVaultQuestion()`.
- */
-
-import { useEffect, useState } from "react";
-import { getQuestionById } from "../corpus-source";
-
-export interface UseVaultQuestionState<T> {
-  data: T | null;
-  loading: boolean;
-  error: Error | null;
-}
-
-export function useVaultQuestion<T = unknown>(id: string | null): UseVaultQuestionState<T> {
-  const [state, setState] = useState<UseVaultQuestionState<T>>({
-    data: null,
-    loading: id !== null,
-    error: null,
-  });
-
-  useEffect(() => {
-    if (id === null) {
-      setState({ data: null, loading: false, error: null });
-      return;
-    }
-    let cancelled = false;
-    setState(s => ({ ...s, loading: true, error: null }));
-    getQuestionById(id)
-      .then(result => {
-        if (cancelled) return;
-        setState({ data: result as T, loading: false, error: null });
-      })
-      .catch(err => {
-        if (cancelled) return;
-        setState({ data: null, loading: false, error: err instanceof Error ? err : new Error(String(err)) });
-      });
-    return () => { cancelled = true; };
-  }, [id]);
-
-  return state;
-}
--- a/interviews/staffml/src/lib/taxonomy.ts
+++ b/interviews/staffml/src/lib/taxonomy.ts
@@ -1,5 +1,5 @@
 import taxonomyData from "../data/taxonomy.json";
-import corpusData from "../data/corpus.json";
+import corpusData from "../data/corpus-summary.json";
 import zonesData from "../data/zones.json";
 import {
  HardDrive, Cpu, Rocket, Layers, Timer, Shuffle,
--- a/interviews/staffml/src/lib/vault-fallback.ts
+++ b/interviews/staffml/src/lib/vault-fallback.ts
@@ -1,22 +0,0 @@
-/**
- * Fallback-mode detection for the Phase-4 cutover.
- *
- * When NEXT_PUBLIC_VAULT_FALLBACK=static, the site reads from the bundled
- * corpus.json (pre-cutover behavior preserved). When unset or 'vault', the
- * site reads from the Worker API via vault-api.ts.
- *
- * One config change inverts the dataflow — no file restore required
- * (ARCHITECTURE.md §7.1 / §6.2, fix for C-1 "one-line revert" lie).
- */
-
-export type VaultSource = "static" | "vault-api";
-
-export function getVaultSource(): VaultSource {
-  const flag = process.env.NEXT_PUBLIC_VAULT_FALLBACK?.toLowerCase();
-  if (flag === "static") return "static";
-  return "vault-api";
-}
-
-export function usingFallback(): boolean {
-  return getVaultSource() === "static";
-}
--- a/interviews/vault-cli/docs/CUTOVER_QA.md
+++ b/interviews/vault-cli/docs/CUTOVER_QA.md
@@ -1,210 +0,0 @@
-# Cutover-Day QA Checklist
-
-> **When to use**: Phase 4 cutover (static `corpus.json` → Worker API + D1).
-> **Who runs**: release operator, sequentially, alone — not in parallel with other site work.
-> **Expands**: ARCHITECTURE.md §19.4.
-> **Rehearsal**: this checklist runs end-to-end on **staging** as a dry run before production cutover.
-
---
-
-## 0. Pre-cutover gate checks (must all be GREEN before starting)
-
- [ ] `vault verify <release>` on the release to be deployed → exit 0.
- [ ] `vault smoke-test --env staging --samples 50` → 0 divergences.
- [ ] All E2E Playwright tests green on staging against staging D1.
- [ ] Lighthouse CI gates green on staging:
-  - [ ] practice/page.js transferred ≤ 300 KB gz.
-  - [ ] gauntlet/page.js ≤ 250 KB gz.
-  - [ ] landing/page.js ≤ 200 KB gz.
-  - [ ] FCP (95th pct, 4G) ≤ 1.2s.
-  - [ ] TTI (95th pct, 4G) ≤ 2.5s.
-  - [ ] Repeat-visit TTI ≤ 800ms.
-  - [ ] API round-trip p99 ≤ 250ms (question detail).
- [ ] FTS5 load-test artifacts from Phase 3 still valid (re-run if >30 days old).
- [ ] R2 pre-deploy snapshot of current production D1 exists and is restore-tested.
- [ ] Rollback drill executed on staging within last 7 days (see §4).
- [ ] Go/no-go reviewed with user. **GO** recorded in an operator log.
-
-If ANY item is red, **do not proceed**. Fix the underlying issue, re-run the gate.
-
---
-
-## 1. Ship the release
-
-> **Note on canary staging** (R10-F-2 + R11): percentage-based traffic split
-> is not implemented in `vault ship` (deferred to Phase 7 per ARCHITECTURE.md
-> §4.3). Current ship is all-or-nothing at the release-keyed Cache API layer.
-> Soak windows below still apply — they're now at 100% traffic, gated on
-> dashboard-green before advancing from staging to production.
-
- [ ] `vault ship <release> --env staging` → journal reports all 3 legs DEPLOYED.
- [ ] `vault smoke-test --env staging --samples 50` post-ship → 0 divergences.
- [ ] Soak 15 min OR ≥100 sessions at 100% staging traffic, whichever longer.
- [ ] All transport SLIs green (5xx <1%, p99 <500ms).
- [ ] All data-plane SLIs green (row-count parity, content-hash sampling, FTS5 parity, schema_fingerprint).
- [ ] `vault ship <release> --env production` → journal reports all 3 legs DEPLOYED.
-  - [ ] `.ship-journal.json` written; tail the journal.
-  - [ ] D1 deploy leg: complete (R2 snapshot taken pre-migration).
-  - [ ] Next.js deploy leg: complete.
-  - [ ] Paper-tag push leg: complete (last).
-  - [ ] `point_of_no_return: true` in journal.
- [ ] Soak 15 min OR ≥100 sessions post-production-ship.
- [ ] `vault smoke-test --env production --samples 100` → 0 divergences.
- [ ] If any SLI reds during soak: `vault rollback <prev-release> --env production --method snapshot --snapshot-ts <ts>` (§6.2 primary path).
-
---
-
-## 2. User-facing flows (manual QA on production)
-
-Operator runs each flow in a clean browser window (no extensions, no prior localStorage). Check the box if the flow completes without error AND the expected outcome is visible.
-
-### 2.1 Home / landing
-
- [ ] `https://staffml.mlsysbook.ai/` loads.
- [ ] Total question count matches `vault stats --release <release>` exact integer.
- [ ] No request in Network tab for `corpus.json` (the 19 MB static file must not be fetched).
- [ ] `practice/page.js` transferred size ≤ 300 KB gzipped (verify in DevTools → Network).
- [ ] FCP ≤ 1.2s (check via Lighthouse).
- [ ] `X-Vault-Release` header present on `/manifest` response; value = current release.
-
-### 2.2 Practice
-
- [ ] Navigate to `/practice`.
- [ ] Filter by track → results update.
- [ ] Filter by level → results update.
- [ ] Filter by zone → results update.
- [ ] Combination filter (track + level + zone) returns expected subset.
- [ ] Reveal answer on a question → solution renders (Markdown + KaTeX if applicable).
- [ ] Navigate a chained question → "Part N of M" badge visible BEFORE reveal.
- [ ] Click chain-badge link → chain sibling list opens.
- [ ] AskInterviewer tutor → ask a question → response arrives within 10s, no errors.
- [ ] Reveal → AskInterviewer switches to study mode; tutor knows canonical answer.
-
-### 2.3 Gauntlet
-
- [ ] Start a gauntlet session with filter → session launches.
- [ ] Complete N questions (at least 3, mix of right and wrong) → scores tracked.
- [ ] View post-mortem → per-question feedback shown.
- [ ] Navigate back to landing → session marked complete in localStorage.
-
-### 2.4 Progress
-
- [ ] `/progress` page loads.
- [ ] Attempts from §2.3 persist.
- [ ] Due-count correct against the test-interval logic.
- [ ] No console errors.
-
-### 2.5 About
-
- [ ] `/about` loads.
- [ ] "Read the paper" call-out visible **above the fold** (no scrolling required on a 1920×1080 viewport).
- [ ] BibTeX snippet renders.
- [ ] DOI (if registered) clickable.
- [ ] Release ID + release_hash visible in footer for reproducibility.
- [ ] Contributor list renders authors from current release's `authors:` fields.
-
-### 2.6 Command palette / search
-
- [ ] `⌘K` (Mac) / `Ctrl+K` (Windows/Linux) opens modal from any page.
- [ ] Input placeholder: "Search N questions by title, scenario, or solution."
- [ ] Type a term → 200ms debounce → results appear with snippet highlights.
- [ ] Up/Down arrow navigates results.
- [ ] Enter opens question; `⌘Enter` opens in new tab.
- [ ] Escape closes modal.
- [ ] Empty query state: helpful message + browse-by-topic link.
- [ ] No-results state: "no results for '...'" message + clear-filters CTA.
- [ ] Mobile (iPhone 15 viewport, 393×852): full-screen modal, no iOS zoom on input focus, touch targets ≥ 44px.
-
-### 2.7 Chain UX
-
- [ ] On a chained question (e.g., part 2 of 4), pre-reveal chain badge is visible.
- [ ] Badge text: "Part 2 of 4 — <chain name>".
- [ ] Badge click → sibling list drawer; shows all chain members with their status (attempted / unattempted).
- [ ] Analytics events fired: `chain_badge_shown`, `chain_badge_clicked` (check Cloudflare Analytics real-time).
-
-### 2.8 Offline resilience
-
- [ ] With the site loaded and at least 5 questions visited:
-  - [ ] Open DevTools → Application → Service Workers → verify `sw.js` registered, controlling.
-  - [ ] Network → check "Offline" → reload page.
-  - [ ] Site shell renders.
-  - [ ] Previously-visited question detail pages load from SW cache.
-  - [ ] "Serving from cache" indicator visible.
- [ ] Toggle back online → SW revalidates manifest → indicator disappears.
-
---
-
-## 3. Network + bundle verification
-
- [ ] **No `corpus.json` fetch** anywhere in the user journey (Network tab filter `corpus`).
- [ ] **Request to `/manifest` returns < 5 KB.**
- [ ] **Request to `/questions/<id>` returns < 10 KB and has correct `ETag` format** `"<release>:<resource>:<content_hash>"`.
- [ ] **304 behavior**: hard-refresh a just-visited question → browser sends `If-None-Match` → Worker returns 304.
- [ ] **Cache API hit on warm**: refresh → Network tab shows `from disk cache` or `from service worker` for manifest/taxonomy.
- [ ] **No console errors** across all flows above.
- [ ] **No CSP violations** (DevTools → Console filter `Content-Security-Policy`).
-
---
-
-## 4. Rollback drill (executed on staging before production cutover)
-
-Rehearsal, not optional. Log steps + timings in the operator log.
-
- [ ] Staging site warm with an active service worker (user has visited ≥10 questions).
- [ ] Set `NEXT_PUBLIC_VAULT_FALLBACK=static` in the site environment.
- [ ] Redeploy site (one command).
- [ ] **Timer start.**
- [ ] User reloads tab.
- [ ] Service worker evicts stale release-keyed entries.
- [ ] Site loads from static inlined corpus + manifest.
- [ ] Question detail pages render.
- [ ] No console errors.
- [ ] AskInterviewer: if worker is still up, tutor works; if down, graceful "tutor temporarily unavailable" indicator.
- [ ] **Timer stop.** Target: rollback complete + user-visible within 10 minutes. Record actual.
- [ ] Restore `NEXT_PUBLIC_VAULT_FALLBACK` unset; redeploy; verify Worker-backed state resumes.
-
-If ANY step is red, do NOT proceed to production cutover. File an issue and fix the rollback path first.
-
---
-
-## 5. Post-cutover watch (first 48 hours on production)
-
- [ ] Dashboard watch scheduled: 30 min, 2h, 6h, 12h, 24h, 48h checkpoints.
- [ ] At each checkpoint:
-  - [ ] Transport SLIs green.
-  - [ ] All data-plane SLIs green (row-count, content-hash sample, FTS5, schema_fingerprint, release-id propagation).
-  - [ ] Search latency p99 within budget.
-  - [ ] Error-tracker: no new Sentry clusters.
-  - [ ] Cost ledger: D1 row-reads tracking within 2× forecast.
- [ ] At 48h: post-cutover review with user; decide on Phase 5 kickoff.
-
---
-
-## 6. Rollback trigger — when to abort
-
-If any of the following occur within the first 48h, trigger rollback via `NEXT_PUBLIC_VAULT_FALLBACK=static`:
-
- 5xx rate > 5% sustained for > 2 min.
- p99 latency > 1 s sustained for > 5 min.
- Any data-plane SLI red for > 10 min without explanation.
- Schema-fingerprint mismatch that persists past a single POP cold-start cycle.
- User-visible content corruption (question renders differently from staging).
- Cost forecast exceeded by > 3× over any 1-hour window.
-
-Rollback does NOT require another user approval — this checklist pre-authorizes the operator to roll back on trigger conditions. Forward-fix decisions (vs rollback) are user-approval-gated.
-
---
-
-## 7. Post-cutover sign-off
-
-After 48h clean watch:
-
- [ ] Final `vault smoke-test --env production --samples 100` green.
- [ ] Operator log committed to `interviews/vault/releases/<version>/cutover-log.md`.
- [ ] Retention policy noted: keep `corpus.json` in site bundle until first schema-major bump OR 2 releases post-cutover, whichever is later (ARCHITECTURE.md §7.1).
- [ ] Phase 4 marked complete in the project tracker.
- [ ] Post-mortem session scheduled if anything from §6 triggered during watch window.
-
---
-
-**End of cutover checklist.** File at `interviews/vault-cli/docs/CUTOVER_QA.md` — keep in sync with ARCHITECTURE.md and TESTING.md.
--- a/interviews/vault/docs/RESUME_PLAN_2026-04-25.md
+++ b/interviews/vault/docs/RESUME_PLAN_2026-04-25.md
@@ -1,331 +0,0 @@
-# Resume Plan — Massive Build Session (2026-04-25)
-
-**Purpose:** hand the next Claude session everything it needs to pick up
-the day's massive question-generation work without re-discovering state.
-
-**Current branch:** `feat/massive-build-2026-04-25` (off
-`audit/vault-schema-folder` ← off `dev`)
-**Worktree:** `/Users/VJ/GitHub/MLSysBook-vault-audit`
-**Last commit:** `24d3269c7 feat(vault): Phase 0 — competency_area cleanup + closed-enum hardening`
-
---
-
-## What's already done (do NOT redo)
-
-### From the audit branch (parent)
-
- 4,754 cohort-tagged IDs renamed to clean `<track>-NNNN` form
-  (commit `8a5c3ff3c`).
- Redirect map at `interviews/vault/docs/id-renames-2026-04-25.yaml` +
-  `interviews/staffml/src/data/id-redirects.json` — preserves shared
-  links to renamed IDs. Wired into the practice page's `?q=` handler.
- 8 Playwright tests passing.
- `vault check --strict` clean.
-
-### From this session (commit `24d3269c7`)
-
- **Phase 0 cleanup**: 41 malformed `competency_area` values fixed (e.g.,
-  `data-pipeline-engineering` → `data`, `evaluation` → `cross-cutting`,
-  `tinyml / queueing-theory` → `latency`).
- **LinkML schema**: added `CompetencyArea` closed enum. `competency_area`
-  field now references it. Future malformed values fail validation.
- **Pydantic validator**: `_area()` field_validator on `Question` rejects
-  anything outside `VALID_COMPETENCY_AREAS`.
- **Generator defaults raised**: `batch_size` 12 → 30, `total` 12 → 30,
-  `max_calls` 10 → 20. Gemini's 1M context easily handles 30 cells/call;
-  the 250/day cap rewards bigger batches.
- **`MASSIVE_BUILD_RUNBOOK.md`**: the methodology document — read this
-  first if you don't know what to do next.
-
-### Verified
-
- Bundle: 9,224 published, **13 canonical competency areas, 0
-  malformed**.
- All 8 Playwright tests pass.
- `vault check --strict` clean.
-
---
-
-## Current corpus state
-
-```
-Published: 9,224
-  cloud:  4,131  (44.8%)
-  edge:   1,976  (21.4%)
-  mobile: 1,644  (17.8%)
-  tinyml: 1,168  (12.7%)
-  global:   305  ( 3.3%)
-
-Drafts (status:draft):    275
-Deleted (dedup archive):  458
-Total YAMLs:              9,982
-
-Visual-eligible (published): 17 across 8 of 10 archetypes
-  Missing: collective-communication (0), kv-cache-management (0)
-
-Top track-area gaps:
-  TinyML/parallelism:    0 of ~90 expected
-  Mobile/parallelism:    0 of ~127 expected
-  Edge/parallelism:     11 of ~152 expected
-  TinyML/networking:     2 of ~90 expected
-  Global L4-L6+:    ~13% of expected density
-```
-
---
-
-## API budget
-
- **Gemini cap**: 250 calls/day
- **Used today (estimate)**: ~30 calls (audit + Phase 0 dry-runs)
- **Available**: ~220 calls
- **Plan budget**: ~80 calls (40 generation + 30 judge + 10 buffer)
- **Headroom remaining**: 140 calls for retries
-
---
-
-## What to do next — execute these phases in order
-
-Each phase is a single command (or short sequence). Stop after Phase 7
-or earlier if anything looks wrong.
-
-### Phase 1 — Run the analyzer (1 minute)
-
-```bash
-cd /Users/VJ/GitHub/MLSysBook-vault-audit
-python3 interviews/vault/scripts/analyze_coverage_gaps.py \
-  --total 100 --published-only
-```
-
-Output goes to `interviews/vault/_validation_results/coverage_gaps/<ts>/`.
-Look at `report.md` for the priority gap ranking. Top cells should be
-the TinyML/Mobile/Edge parallelism rows and Global L4-L6+ cells.
-
-### Phase 2 — Bump loop defaults, then run (2-4 hours wall clock; 80 API calls)
-
-First, bump the loop defaults. **Edit** `interviews/vault/scripts/iterate_coverage_loop.py`:
-
-| Flag | Current default | New default |
-|---|---|---|
-| `--max-iters` | 20 | 30 |
-| `--max-calls` | 60 | 80 |
-| `--gen-batch-size` | 12 | 30 |
-| `--gen-calls-per-iter` | 3 | 4 |
-| `--judge-chunk-size` | 15 | 25 |
-
-Specifically lines 220-226 of `iterate_coverage_loop.py`. Update both the
-`default=N` values AND the help text comments.
-
-Then run the loop:
-
-```bash
-python3 interviews/vault/scripts/iterate_coverage_loop.py \
-  --max-iters 30 \
-  --max-calls 80 \
-  --gen-batch-size 30 \
-  --gen-calls-per-iter 4 \
-  --judge-chunk-size 25 \
-  --visual-each-iter \
-  --gap-threshold 0.8 \
-  --max-drop-rate 0.35
-```
-
-Each iteration:
- 4 generation calls × 30 cells = 120 questions
- 1-2 judge calls
- ~5 minutes wall clock
-
-The loop self-paces and stops on saturation (drop rate > 35%, gap
-priority < 0.8, or convergence on the same top cell two iters in a
-row).
-
-**Expected output**: 600-1,200 generated drafts, 70-75% pass rate via
-judge, 8-15 iterations before auto-stop.
-
-### Phase 3 — Quality gate (10 min)
-
-Spot-read 3 generated drafts per track:
-
-```bash
-ls -t interviews/vault/questions/cloud/*.yaml | head -3 | xargs -I{} cat {}
-ls -t interviews/vault/questions/tinyml/*.yaml | head -3 | xargs -I{} cat {}
-# etc.
-```
-
-Check the visual quality on 2 random visual drafts via Playwright by
-deep-linking. Open `/practice?q=<id>` for an SVG visual that was just
-rendered, eyeball whether it fits the column at 720px width without
-overflow, alt text reads clean, no horizontal scroll.
-
-### Phase 4 — Promote PASS items + rebuild bundle (5 min)
-
-```bash
-python3 interviews/vault/scripts/promote_validated.py
-PYTHONPATH=interviews/vault-cli/src \
-  python3 -m vault_cli.main build --legacy-json
-PYTHONPATH=interviews/vault-cli/src \
-  python3 -m vault_cli.main check --strict
-```
-
-Acceptance: `vault check --strict` returns exit 0, no orphan chains,
-`published_count` is up by ~600-900.
-
-### Phase 5 — Refresh paper artifacts (10 min)
-
-```bash
-# vault build re-emits corpus.json to staffml/. Mirror it to vault/:
-cp interviews/staffml/src/data/corpus.json interviews/vault/corpus.json
-
-# Then the paper-side regen sequence:
-cd interviews/paper
-python3 scripts/analyze_corpus.py     # legacy schema corpus_stats.json
-python3 scripts/generate_figures.py    # 4 data figures
-PYTHONPATH=../vault-cli/src python3 scripts/generate_macros.py
-                                       # macros.tex + corpus_stats.json (overwrites legacy)
-
-# Update hardcoded zone counts in paper.tex if shifted:
-# Line ~867: "diagnosis (1{,}583), fluency (1{,}227), and evaluation (1{,}113)"
-# Replace with new values from current corpus_stats.json by_zone.
-
-pdflatex -interaction=nonstopmode paper.tex
-```
-
-Acceptance: `Output written on paper.pdf (N pages, ...)` with no
-"undefined citation" errors in the output (citation warnings are pre-
-existing and unrelated).
-
-### Phase 6 — GUI verification (5 min)
-
-```bash
-# Restart dev server fresh:
-pkill -f "next-server\|next dev"; sleep 1
-cd /Users/VJ/GitHub/MLSysBook-vault-audit/interviews/staffml
-(npx next dev > /tmp/staffml-dev.log 2>&1 &)
-sleep 8
-curl -sI http://localhost:3000/practice 2>&1 | head -1   # expect 200
-
-npx playwright test tests/practice-smoke.spec.ts --reporter=list
-```
-
-Acceptance: all 8 tests pass.
-
-Then a manual eyeball: open `http://localhost:3000/practice` in a
-browser, click the area filter, confirm exactly 13 canonical entries
-plus "All". This is the user-facing fix that motivated Phase 0.
-
-### Phase 7 — Atomic commit (3 min)
-
-```bash
-cd /Users/VJ/GitHub/MLSysBook-vault-audit
-git status --short  # should show vault/questions/ changes + paper artifacts
-
-git add interviews/vault/questions/ \
-        interviews/staffml/src/data/corpus.json \
-        interviews/staffml/src/data/corpus-summary.json \
-        interviews/staffml/src/data/vault-manifest.json \
-        interviews/paper/macros.tex \
-        interviews/paper/corpus_stats.json \
-        interviews/paper/figures/ \
-        interviews/paper/paper.tex \
-        interviews/vault/_validation_results/
-
-git commit -m "feat(vault): massive build — N drafts generated, M promoted
-
-Phase 1 (analyzer):  top priority cells were tinyml/parallelism (0/90),
-                     mobile/parallelism (0/127), edge/parallelism (11/152).
-Phase 2 (loop):      <ITERS> iterations, <CALLS> API calls, <GEN> generated.
-                     Auto-stop fired on: <SATURATION REASON>.
-Phase 3 (quality):   spot-read 15 drafts; <Y/N> needed manual edits.
-Phase 4 (promote):   <K> PASS items promoted; bundle now <P> published.
-Phase 5 (paper):     macros bumped to <P>, figures rebuilt, zone-count
-                     prose updated.
-Phase 6 (GUI):       all 8 Playwright tests pass; area filter shows 13
-                     canonical entries.
-
-The runbook (vault/docs/MASSIVE_BUILD_RUNBOOK.md) is the methodology
-this session followed; it can be re-run on any future generation day."
-```
-
-If the corpus.json hand-edit warning fires, add the trailer:
-```
-Vault-Override: corpus-json-hand-edit: regenerated via vault build
-```
-
---
-
-## Common saturation outcomes
-
-If Phase 2's loop stops early, the auto-stop reason will be one of:
-
-| Reason | Meaning | What to do |
-|---|---|---|
-| `top priority gap < 0.8` | Corpus is balanced enough that no cell is desperately empty | This is success. Move to Phase 3. |
-| `DROP rate > 35%` | Gemini is hallucinating; cells we're targeting are nonsensical for some tracks | Inspect the latest iter's `judge_summary.json` to see which cells failed. Add to `TRACK_TOPIC_BLOCKLIST` in `analyze_coverage_gaps.py`. |
-| `same top cell two iters in a row` | Generator can't fill the cell (likely matplotlib script crashing) | Check `_validation_results/gemini_generation/<latest>/raw_*.txt` for the source code Gemini generated; render manually with `python3 render_visuals.py --id <id>` to see the error. |
-| `max-iters reached` | Hit the iteration cap before saturation | Re-run with higher `--max-iters 50` if budget allows. |
-| `max-calls reached` | Burned through the API budget | Stop. We're done for the day. |
-
---
-
-## What NOT to do
-
-These are settled decisions; don't relitigate without explicit user
-direction:
-
- ❌ Don't add `<track>/<topic>/` subdirs (ARCHITECTURE.md §3.3 — flat
-  is correct).
- ❌ Don't rename more legacy IDs (already done: 4,754 renamed in commit
-  `8a5c3ff3c`).
- ❌ Don't merge to dev without explicit user OK.
- ❌ Don't push to remote without explicit user OK.
- ❌ Don't change schema enum values (CompetencyArea, Track, Level, Zone,
-  Status, Provenance) — those are the canonical 4-axis taxonomy.
- ❌ Don't auto-promote NEEDS_FIX items; only PASS verdicts go to
-  published.
- ❌ Don't skip the Pydantic validator pass (`vault check --strict`)
-  before commit.
-
---
-
-## Files of interest (for context)
-
-| File | Why |
-|---|---|
-| `interviews/vault/docs/MASSIVE_BUILD_RUNBOOK.md` | The full day's methodology. Read first. |
-| `interviews/vault/audit/2026-04-25-schema-folder-audit.md` | Why the schema/folder is shaped the way it is. |
-| `interviews/vault/CHANGELOG.md` | History of the v0.1 → v1.0 migration and what it fixed. |
-| `interviews/vault/ARCHITECTURE.md` §3.3 | Why path-as-classification was rejected. |
-| `interviews/vault/docs/ID_SCHEMES.md` | Why IDs are `<track>-NNNN`. |
-| `interviews/vault/docs/id-renames-2026-04-25.yaml` | The 4,754 cohort→clean rename map. |
-| `interviews/vault/scripts/iterate_coverage_loop.py` | The day's main driver. |
-| `interviews/vault/scripts/analyze_coverage_gaps.py` | Priority ranking. |
-| `interviews/vault/scripts/gemini_cli_generate_questions.py` | Batched Gemini generation. |
-| `interviews/vault/scripts/gemini_cli_llm_judge.py` | Multi-criteria validator. |
-| `interviews/vault/scripts/promote_validated.py` | Lifecycle flip. |
-| `interviews/vault/scripts/render_visuals.py` | DOT/matplotlib → SVG. |
-| `interviews/vault/scripts/fix_competency_areas.py` | Phase 0 cleanup script (one-time, can re-run safely). |
-
---
-
-## One-liner status check (run first in next session)
-
-```bash
-cd /Users/VJ/GitHub/MLSysBook-vault-audit && \
-  git log --oneline -5 && echo "---" && \
-  git status --short | head -10 && echo "---" && \
-  PYTHONPATH=interviews/vault-cli/src \
-    python3 -m vault_cli.main check --strict 2>&1 | tail -3 && \
-  echo "---" && \
-  python3 -c "
-import json
-c = json.load(open('interviews/staffml/src/data/corpus.json'))
-print(f'published: {len(c)}')
-visuals = [q for q in c if q.get('visual')]
-print(f'with visuals: {len(visuals)}')
-from collections import Counter
-print('areas:', sorted(set(q['competency_area'] for q in c)))
-"
-```
-
-If the output shows commit `24d3269c7`, clean tree, `vault check`
-passes, and 13 canonical areas — the resume state is healthy. Proceed
-to Phase 1.
--- a/interviews/vault/docs/RESUME_PLAN_PHASE_D.md
+++ b/interviews/vault/docs/RESUME_PLAN_PHASE_D.md
@@ -1,273 +0,0 @@
-# Resume Plan — Phase D/E/F (Priority Gap Closure + Generator Leverage)
-
-**Purpose:** hand the next Claude session everything it needs to close
-the parallelism + global L4-L6+ gaps that have remained open across
-two prior multi-phase pushes, plus three high-leverage generator
-improvements that pay for themselves on every future run.
-
-**Companion docs (same branch):**
- `RESUME_PLAN_2026-04-25.md` — Phase 1-7 (committed at `ece6eccf2`)
- `RESUME_PLAN_RELEASE.md` — Phase A (committed at `542aaf95d`)
- this doc — Phase D/E/F
-
---
-
-## Current state
-
-| | |
-|---|---|
-| **Worktree** | `/Users/VJ/GitHub/MLSysBook-massive-build` |
-| **Branch** | `feat/massive-build-2026-04-25-run` |
-| **HEAD** | `e7cd3b24c feat(vault): Phase B + C — 144 PASS items added (B.5: 110, C.4: 34)` |
-| **Bundle** | 9,688 published (was 9,224 at branch cut, +464 net) |
-| **All gates** | green (`vault check --strict`, lint, doctor, codegen, validate-vault, render) |
-
---
-
-## What's already done (do NOT redo)
-
-### Phase A (commit `542aaf95d`)
- 3 structural Pydantic validators added: `visual.path-resolves`, `_zone_bloom_compatible`, `disk-coverage`
- Lint calibration via 4-expert consensus (1,308 → 0 warnings)
- Registry repaired (5,269 IDs appended), doctor split into `disk-coverage` (HARD) + `registry-history` (INFO)
- Chain integrity full pass (0 errors / 0 warnings)
- Practice page zoom modal + 9th Playwright test
-
-### Phase B (in commit `e7cd3b24c`)
- Generator hardened: `bloom_for_zone_level()` respects ZONE_BLOOM_AFFINITY, prompt requires `bloom_level` field, lists 13 canonical competency_areas inline, demands L5/L6+ depth (no trivial division framings).
- **Validate-at-write**: every Gemini-emitted YAML round-trips through `Question.model_validate()` before disk write.
- B.5 loop saturated at iter 4 on `DROP rate 38.3% > 35%` (judge tightening on L6+ depth, not budget). Yield: 110 PASS in 26 calls.
-
-### Phase C (in commit `e7cd3b24c`)
- 120 NEEDS_FIX items from prior session re-edited via fix-agent (92 edited, 28 already-resolved).
- Re-judge: 67 of 92 judged → 34 PASS / 13 NEEDS_FIX / 20 DROP. 34 PASS promoted.
-
-### Saturation reasons (carry-forward signal)
- B.5: `DROP rate 38.3% exceeds 35% — likely hallucination`. Judge rejects nearly half of L6+ depth items even with the strengthened prompt. Adding more API calls won't help; deeper prompt scaffolding will.
- C.3: 25 of 92 items unjudged (max-calls=5 chunk cap).
-
---
-
-## What's still open (Phase D/E/F)
-
-### Three priority gaps that remain
-| Gap | Current | Expected | Status |
-|---|---|---|---|
-| `tinyml/parallelism` (area-level) | 1 | ~95 | **never closed** |
-| `mobile/parallelism` (area-level) | 0 | ~134 | **never closed** |
-| `edge/parallelism` (area-level) | 13 | ~159 | barely moved |
-| `global/realization/L4-L6+` | 0 | ~14 | empty |
-| `global/specification/L6+` | 0 | ~5 | empty |
-| `global/mastery/L5` | 0 | ~5 | empty |
-
-**Why prior runs didn't close them**: the analyzer's recommended_plan
-picks **topic-level** cells (queueing-theory, memory-hierarchy-design,
-etc.) by priority, but the parallelism gap aggregates across multiple
-parallelism-flavored topics (pipeline-parallelism,
-collective-communication, kv-cache-management, interconnect-topology).
-None of those individual topic cells crack the top-100 priority list, so
-the loop never targets them. Closing the area-level gap requires
-**hand-built topic targets**, bypassing the analyzer.
-
-### Three carry-forwards from C.3
- 25 unjudged items — max-calls cap left them on the table
- 13 still-NEEDS_FIX after one fix attempt — second fix pass possible
- 20 DROP items — could be salvaged with a deeper rewrite
-
---
-
-## Phases D + E + F
-
-### Phase D — Priority gap closure (THE mission, finally)
-
-| ID | Task | Acceptance | Effort |
-|---|---|---|---|
-| D.1 | Hand-author **~50 parallelism targets** as `track:topic:zone:level` strings. Topics: `pipeline-parallelism`, `collective-communication`, `kv-cache-management`, `interconnect-topology`. Tracks: edge/mobile/tinyml at L4-L6+. Skip cloud (already dense). Save to `tools/phase_d/parallelism_targets.txt`. | File written, ≥40 cells, all 4 topics represented | 30 min |
-| D.2 | Author a **parallelism-specific prompt variant** in the generator. Adds these rules: (a) forbid bandwidth-division framings (`payload / bandwidth`); (b) require concrete topology (NVLink/IB/PCIe/RoCE/LoRa) appropriate to the track; (c) require a synchronization or bubble cost in the question; (d) require non-trivial system integration. Toggle via `--prompt-variant parallelism` CLI flag. | Manual test: feed 5 cells, judge ≥3 of 5 PASS at high confidence | 1.5 hr |
-| D.2' | **REVIEW CHECKPOINT** — surface prompt + 5 sample drafts for user review before D.3 burns API budget | User signs off | — |
-| D.3 | Run focused loop (15-20 API calls, batch_size 30) targeting D.1's hand-built cells with `--prompt-variant parallelism` | Loop summary: ≥20 PASS items in parallelism cells | 2 hr wall clock |
-| D.4 | Spot-read all PASS items from D.3 (~30-50); reject any that read as bandwidth-math (manual edit to set `status: archived` or rewrite). Promote the rest. | All promoted items have non-trivial framings | 30 min |
-| D.5 | Same mechanism for **global L4-L6+**: hand-author ~20 cells, run focused loop with **standard prompt** (global cells aren't parallelism-flavored, just under-filled). | ≥10 global L4-L6+ PASS items | 2 hr wall clock |
-| D.6 | Promote, rebuild bundle, regen paper artifacts | `vault check --strict` clean; published count up by 30-60 | 30 min |
-
-**Phase D total**: ~7 hr work, ~5 hr wall clock, ~30-40 API calls.
-
-### Phase E — Generator efficiency (compounding leverage)
-
-| ID | Task | Acceptance | Effort | Saves |
-|---|---|---|---|---|
-| E.1 | **Retry-on-validation-fail** in `gemini_cli_generate_questions.py`. If `Question.model_validate()` rejects, single retry with prompt suffix `"your previous JSON had these violations: <list>. Re-emit only the failed items, fixed."` Second failure logs structured error and skips. | Unit test: feed bad dict → script retries once, recovers | 45 min | ~50% of API calls (B.5's iter 1 + iter 3 lost 8 of 26 = 31%) |
-| E.2 | **Auto-update vault-manifest.json from `vault build`**. Currently maintained by hand; pre-commit caught the gap twice this session. | `vault build --legacy-json` writes a fresh manifest with current counts + hash | 30 min | Manifest-stale failures eliminated |
-| E.3 | **Tighten the analyzer**: add `--include-areas parallelism,networking` flag so the recommended_plan can include cells weighted by track×area gap (not only track×topic gap). Solves the structural issue that drove D.1's hand-authoring. | Run with `--include-areas parallelism` returns plan with ≥10 parallelism-topic cells | 1 hr | Future runs don't need D.1's hand-build step |
-
-**Phase E total**: ~2.5 hr.
-
-### Phase F — Residual cleanup (completeness)
-
-| ID | Task | Acceptance | Effort |
-|---|---|---|---|
-| F.1 | **Re-judge the 25 unjudged items** from C.3. Use the same fix-agent-edited paths from `tools/phase_c/needs_fix_manifest.json`. | 25 items judged; promote any flipped to PASS | 20 min |
-| F.2 | **Second-pass fix-agent** on remaining 13 NEEDS_FIX + 20 DROP from C.3. Spawn `general-purpose` agent with the C.3 judge's verdicts as input. | Each item edited; re-judged; promote flipped | 1 hr |
-| F.3 | **Spot-read 20 PASS items** stratified across this push's promotions (Phase B + C combined = 144 items). Rejection bar: shallow framings, math errors, hardware-spec inaccuracies. | Reviewed list saved; rejection rate ≤ 10% | 1 hr |
-
-**Phase F total**: ~2.5 hr.
-
---
-
-## Parallelism map (what can run concurrently)
-
-The cleanest interleaving:
-
-```
-Stage 1 — sequential prep (no API)              ~3 hr
-  D.1 (hand-build targets)
-    └── D.2 (parallelism prompt)
-          └── E.1 (retry-on-validate-fail)
-                └── E.2 (auto-manifest)
-                      └── E.3 (analyzer flag)
-                            └── (D.2' user review)
-
-Stage 2 — parallel execution                    ~2 hr wall clock
-  D.3 (parallelism loop, 15-20 calls)  ━┓
-                                          ┣━ both write disjoint IDs
-  F.2 (fix-agent on 33 items)          ━┛   no race risk
-
-Stage 3 — parallel execution                    ~2 hr wall clock
-  D.5 (global loop, 10-15 calls)       ━┓
-                                          ┣━ all disjoint
-  F.1 (re-judge 25 unjudged)           ━┫
-                                          ┃
-  F.3 (spot-read first 10 of 20)       ━┛  read-only
-
-Stage 4 — sequential finalize                   ~1 hr
-  D.4 (parallelism spot-read + promote)
-    └── D.6 (rebuild bundle, regen paper)
-          └── F.3 (finish spot-read second 10)
-                └── final commit
-```
-
-**Total wall clock**: ~8 hr (vs ~10-12 hr serial).
-
-**API budget**: ~30-40 calls expected (Gemini cap is 250/day; today used ~76, so ~174 remaining).
-
-### Parallelism safety rules
-
-1. **No two generation loops concurrent** — both call `next_id_for_track()` which is filesystem-stat-based; concurrent calls can race on the next ID. D.3 must finish before D.5 starts.
-2. **Generation loop + fix-agent OK** — disjoint ID ranges (loop writes new, agent edits existing).
-3. **Generation loop + judge OK** — judge reads files, doesn't write to questions/.
-4. **No schema changes during loops** — schema changes invalidate validate-at-write contract mid-stream.
-
---
-
-## Locked decisions (do NOT relitigate)
-
-| Decision | Choice |
-|---|---|
-| **Release tag** | One stable dev branch, no mid-stream release tag (per prior plan) |
-| **Bloom canonical** | When zone-bloom conflict, trust bloom; reclassify zone via `BLOOM_CANONICAL_ZONE` |
-| **Validate-at-write severity** | ERROR (Pydantic hard-rejects), not WARN |
-| **D.2 prompt authorship** | Claude drafts, user reviews at D.2' |
-| **Test-first for E.x** | Unit tests before real API calls (cheaper failure mode) |
-
---
-
-## Review checkpoints
-
-1. **D.2'** — surface parallelism prompt + 5 sample drafts for user review before D.3 fires the loop.
-2. **D.4** — surface PASS items for spot-read; user can flag any that read shallow.
-3. **Final** — surface all gates green + commit summary.
-
---
-
-## Common saturation outcomes for D.3 / D.5
-
-If D.3 stops early:
-
-| Reason | Meaning | What to do |
-|---|---|---|
-| `DROP rate > 35%` | Judge rejecting parallelism items as too shallow | Inspect the latest iter's `judge_summary.json` — if rejections are about "trivial topology" framings, tighten D.2 prompt further. If about correctness errors, accept the saturation. |
-| `same top cell two iters` | Generator can't fill | Hit budget cap; move on, document as ceiling |
-| `max-calls reached` | Burned through API budget | Stop. Commit what we have. |
-| `0 drafts produced` | Validate-at-write rejected entire batch | E.1's retry should have prevented this; if it persists, dump the prompt and inspect Gemini's raw output |
-
---
-
-## What NOT to do
-
- ❌ Don't merge to `dev` until all gates green AND user explicitly OKs.
- ❌ Don't push to remote without explicit user OK.
- ❌ Don't run two generation loops concurrently (next-id race).
- ❌ Don't add `Co-Authored-By` lines or automated attribution footers.
- ❌ Don't change ZONE_BLOOM_AFFINITY or schema enum values without explicit user direction.
- ❌ Don't auto-promote NEEDS_FIX without re-judge.
- ❌ Don't suppress lint warnings or skip pre-commit hooks (`--no-verify` forbidden).
- ❌ Don't auto-cut a release tag (`v0.1.2`) — single stable commit is the goal.
- ❌ Don't navigate to or modify files in sibling worktrees.
-
---
-
-## Files of interest
-
-| File | Why |
-|---|---|
-| `interviews/vault/docs/RESUME_PLAN_2026-04-25.md` | Phase 1-7 history |
-| `interviews/vault/docs/RESUME_PLAN_RELEASE.md` | Phase A history |
-| `interviews/vault/docs/MASSIVE_BUILD_RUNBOOK.md` | Methodology document |
-| `interviews/vault/_validation_results/coverage_loop/20260425_192956/` | Most recent loop output (B.5) — judge_summary.json per iter, NEEDS_FIX details with fix_suggestions |
-| `interviews/vault/_validation_results/phase_c_rejudge/judge_summary.json/20260425_201121/summary.json` | C.3 re-judge verdicts |
-| `tools/phase_c/needs_fix_manifest.json` | The 120-item NEEDS_FIX queue (the 13 still-pending + 20 DROP go here for F.2) |
-| `tools/phase_b/cell_triage.json` | The 14 L6+/L5-deep cells (a subset of what D.2's prompt should target) |
-| `interviews/vault/scripts/gemini_cli_generate_questions.py` | **D.2 + E.1 edit here.** |
-| `interviews/vault/scripts/analyze_coverage_gaps.py` | **E.3 edits here.** |
-| `interviews/vault-cli/src/vault_cli/commands/build.py` (or equivalent) | **E.2 edits here** to write the manifest. |
-| `interviews/vault/schema/enums.py` | ZONE_BLOOM_AFFINITY + BLOOM_CANONICAL_ZONE + widened ZONE_LEVEL_AFFINITY (do not edit lightly) |
-
---
-
-## One-liner status check (run first in next session)
-
-```bash
-cd /Users/VJ/GitHub/MLSysBook-massive-build && \
-  git log --oneline -3 && echo "---" && \
-  git status --short | head -5 && echo "---" && \
-  PYTHONPATH=interviews/vault-cli/src \
-    python3 -m vault_cli.main check --strict 2>&1 | tail -2 && \
-  echo "---" && \
-  PYTHONPATH=interviews/vault-cli/src \
-    python3 -m vault_cli.main lint interviews/vault/questions/ 2>&1 | tail -2 && \
-  echo "---" && \
-  PYTHONPATH=interviews/vault-cli/src \
-    python3 -m vault_cli.main doctor 2>&1 | grep -cE "fail" | xargs -I{} echo "doctor fails: {}" && \
-  echo "---" && \
-  python3 -c "
-import json
-c = json.load(open('interviews/staffml/src/data/corpus.json'))
-print(f'published: {len(c)}')
-"
-```
-
-If output shows commit `e7cd3b24c`, clean tree, vault check passes,
-0 lint warnings, 0 doctor fails, 9,688 published — the resume state
-matches this plan's starting assumptions. **Proceed to D.1.**
-
-If anything differs, **stop and reconcile** before any code edits.
-
---
-
-## Pacing
-
-This is a ~12-15 hour push compressed to ~8 hr wall clock by the
-parallelism map. Plausibly two focused sessions, or one long one.
-
-The biggest risk is D.3 saturating at low yield (<10 parallelism PASS
-items). If that happens, D.5 becomes the only material content gain
-of this push, and the parallelism gap stays open as a documented
-limitation rather than a closed mission. That is acceptable — the
-branch was already StaffML-day-ready before Phase D started.
-
-The smallest budget commitment is Phase E (no API calls; pure
-generator infra). If only one phase fits, do E — it compounds for
-every future generation run, while D is a one-time content gain.
-
-Three explicit user-review checkpoints (D.2', D.4, final). Wait for
-sign-off at each before continuing.
--- a/interviews/vault/docs/RESUME_PLAN_RELEASE.md
+++ b/interviews/vault/docs/RESUME_PLAN_RELEASE.md
@@ -1,314 +0,0 @@
-# Resume Plan — Release-Ready Cleanup + Balanced Generation (2026-04-25)
-
-**Purpose:** hand the next Claude session everything it needs to take
-`feat/massive-build-2026-04-25-run` from "ships with caveats" to
-"stable dev branch ready for StaffML day."
-
-**Companion doc:** `interviews/vault/docs/RESUME_PLAN_2026-04-25.md`
-(the prior session's plan — completed through Phase 7, commit `ece6eccf2`).
-
---
-
-## Current state
-
-| | |
-|---|---|
-| **Worktree** | `/Users/VJ/GitHub/MLSysBook-massive-build` |
-| **Branch** | `feat/massive-build-2026-04-25-run` (off `feat/massive-build-2026-04-25` in `vault-audit`) |
-| **HEAD** | `ece6eccf2 feat(vault): massive build — 630 drafts generated, 320 PASS promoted, paper 0.1.1` |
-| **Parent branch** | `feat/massive-build-2026-04-25` in `MLSysBook-vault-audit`, untouched |
-
-**`dev` has advanced** since this branch was cut (was `4a7c64585`, now
-`72a741aa1`). Future merge to `dev` will need rebase or merge resolution.
-**Do not merge yet** — finish the cleanup + balanced generation first.
-
---
-
-## What's already done (do NOT redo)
-
-### From commit `ece6eccf2` (this session, 2026-04-25)
-
- 6-iter Gemini coverage loop ran; 50 of 80 API calls used.
- **630 drafts generated**, **320 PASS promoted** to published.
- Bundle: `9,224 → 9,544 published` (+320 exact).
- 234 visual assets mirrored to `staffml/public/question-visuals/`.
- Paper artifacts refreshed against new `0.1.1` release
-  (`release_hash: 0350da5706e6`); `paper.pdf` compiles to 25 pages.
- Loop defaults bumped: `max-iters 30`, `max-calls 80`, `batch 30`,
-  `calls/iter 4`, `judge-chunk 25`.
- `fix_competency_areas.py` REMAP table extended with 30+ new patterns
-  (zones-as-area, bloom-verbs-as-area, underscore hallucinations,
-  dash/slash track-prefix forms). All 462 malformed drafts now canonical.
- `vault-manifest.json` refreshed: questionCount 9,224 → 9,544,
-  contentHash 539eb877f9cc → 0350da5706e6.
- All 8 Playwright tests pass.
-
-### Saturation reason (carry-forward signal)
-
-`same top-priority cell two iterations in a row — converged`. Top
-priority decay 2.25 → 2.14 → 2.03 → 1.93 → 1.83 plateaued. Both halt
-conditions (gap-threshold 0.8, max-calls 80) had headroom remaining;
-**structural convergence fired first**. Generator cannot meaningfully
-shrink `tinyml/specification/L6+` further within the current prompt
-framing. **This is the central problem Phase B addresses.**
-
---
-
-## Audit findings from this session (so the next session does not rediscover)
-
-### 1. Distribution closure — PARTIAL FAILURE
-
-The 320 PASS items did NOT close the priority gaps the analyzer flagged:
-
-| Targeted gap | Before | After | Δ | % gap closed |
-|---|---|---|---|---|
-| tinyml/parallelism | 0 | 1 | +1 | 1% |
-| tinyml/networking | 2 | 11 | +9 | 10% |
-| **mobile/parallelism** | **0** | **0** | **+0** | **0%** |
-| edge/parallelism | 11 | 13 | +2 | 1% |
-| global L4–L6+ | 189 | 189 | +0 | 0% |
-
-Where they actually landed: `mobile/memory` (16), `mobile/networking` (15),
-`tinyml/cross-cutting` (13), `tinyml/power` (13), `mobile/data` (13). All
-useful, none on the original priority list. **Phase B's job is to close
-the actual targeted cells** with prompt templates engineered for the
-content type, not just more API calls.
-
-Why parallelism failed: judge DROPped most parallelism drafts as
-"too-shallow framing" (e.g., `cloud-4490` verdict: *"Simple division of
-payload by bandwidth is too trivial for L6+ Staff level"*). The fix is
-**template-level, not budget-level**.
-
-### 2. Schema completeness — STRONG (with one defect)
-
- 320/320 PASS items have full `details.{realistic_solution,
-  common_mistake, napkin_math}` ✓
- 135/136 visual references resolve to real SVG ✓
- 1 defect: **`mobile-1962`'s graphviz render crashed silently** —
-  only `.dot` source exists, no `.svg`. Judge passed it because YAML
-  was structurally valid. `render_visuals.py` does not propagate
-  failures.
-
-### 3. Quality at scale — ~7.5/10 average across 10 stratified items
-
-Strong: `edge-2431` (Jetson NvSciBuf zero-copy), `tinyml-1658` (256KB
-SRAM cliff diagnosis), `mobile-1923` (UFS write-amplification),
-`tinyml-1635` (closed-form duty-cycle), `edge-2313` (Hailo-8 PCIe
-pipeline bubble). Math correct in all 10. Real hardware grounding in
-all 10.
-
-Weak: `edge-2423` (asks for "standard programming pattern" — too
-generic, OS-textbook style).
-
-### 4. All-checks audit
-
-| Gate | Result |
-|---|---|
-| `vault check --strict` | ✓ 0 errors / 0 invariant failures |
-| `vault doctor / release-integrity` | ✓ 0.1.1 verified |
-| `vault doctor / content-hash-sample` | ✓ 20/20 sampled hashes match |
-| `vault doctor / registry-integrity` | ✗ 5,269 missing from registry; 4,479 registry orphans |
-| `vault lint` | 0 errors / **1,308 warnings** (all `zone-level-affinity`; 303 on new items, 1,005 pre-existing) |
-| Playwright (8 tests) | ✓ all pass |
-| Pre-commit hook | ✓ (after manifest refresh) |
-
-**Registry drift forensics (resolved cause; not a worktree issue):**
-Registry is identical across all 3 worktrees (MD5 `a9a259c559cc23b03ca371683ad81d6d`).
-The 4,479 orphan registry entries are old cohort-tagged IDs
-(`tinyml-exp2-desi-0184`, `cloud-fill-04027`, `tinyml-cell-13251`)
-left over from commit `8a5c3ff3c`'s rename refactor that updated YAMLs
-but never appended to the registry. The 5,269 disk orphans are: 4,754
-renamed-INTO clean IDs + 320 from this session + ~195 prior-run
-unappended items. **94% of the drift pre-existed this session.**
-
---
-
-## Locked decisions (do NOT relitigate)
-
-| Decision | Choice |
-|---|---|
-| **A.6 lint calibration** | Spawn 4 expert agents on a stratified sample of disputed (zone, level) pairs; consolidate via `consensus-builder`; widen rule for accepted pairs, reclassify items in rejected pairs, ack-list disputed pairs. Must hit **0 lint warnings** before proceeding. |
-| **A.7 chain integrity** | Fix the data — full pass on the 29 single-question chains + 101 non-sequential. Not the relaxation shortcut. |
-| **A.8 zoom UX** | `react-medium-image-zoom` (4KB, click-to-zoom modal, ESC closes). Lightest + most responsive. |
-| **B.3 prompt authorship** | Claude drafts; user reviews before B.5 fires the loop. |
-| **Release cadence** | One stable dev branch at the end. No mid-stream release tags. The user's framing: *"I just want the dev branch to come to a stable point for StaffML day."* |
-
---
-
-## Review checkpoints (pause for user input)
-
-1. **After A.6.3 expert consensus lands** — before applying calibration to the lint rule.
-2. **After B.3 prompt drafts are written** — before B.5 fires the
-   generation loop and burns API budget.
-3. **Before D.2** — final atomic commit; user confirms branch is
-   stable-state ready.
-
---
-
-## Phase A — Cleanup (sequential, blocking everything else; ~7-8 hr)
-
-| ID | Task | Acceptance criterion | Effort |
-|---|---|---|---|
-| A.1 | Re-run `render_visuals.py` for `mobile-1962`; if graphviz still crashes, fix `.dot` source or strip the `visual:` block | `interviews/vault/visuals/mobile/mobile-1962.svg` exists OR YAML's visual block removed | 10 min |
-| A.2 | `render_visuals.py`: non-zero exit on any per-item crash; capture per-ID stderr to `_validation_results/render_failures.json` | Inject a broken `.dot` test; confirm exit code != 0 + log written | 30 min |
-| A.3 | LinkML schema: type the `visual` block as a structured sub-schema. `kind` enum `[svg, png]`, `path` regex `^[a-z0-9-]+\.(svg\|png)$`, required `alt` (≥10 chars) + `caption` (≥5 chars) | LinkML codegen produces typed `Visual` class; existing 234 visual items still validate | 45 min |
-| A.4 | Pydantic field-validator: `visual.path` MUST resolve to a real file in `visuals/<track>/`; reject otherwise | Unit test: YAML with `visual.path: nonexistent.svg` fails `Question.model_validate()` | 30 min |
-| A.5 | Registry repair: write `tools/repair_registry.py` reading disk → appending 5,269 missing IDs as `created_by: registry-rebuild-2026-04-25`. Add comment block above the new entries documenting the rename history. Refactor `doctor.py:_check_registry_integrity` into two checks: `disk-coverage` (HARD FAIL if disk file unregistered) and `registry-history` (INFO only for retired IDs). | `vault doctor` shows `disk-coverage: pass`; `registry-history: info`. Registry is append-only (no deletions). | 1 hr |
-| A.6 | **Expert-driven lint calibration** (replaces the original "empirical widen" version). See A.6.* breakdown below. | `vault lint interviews/vault/questions/` reports 0 errors / **0 warnings** | 2 hr |
-| A.7 | Chain integrity: 29 single-question chains + 101 chains with non-sequential positions. Audit each → fix the chain (renumber positions / extend with siblings) or drop the chain entirely. | Pre-commit hook reports 0 chain warnings | 1.5 hr |
-| A.8 | Practice page: render visual inline beside question + click-to-zoom modal using `react-medium-image-zoom`. Add Playwright test: load known-visual question, click image, verify modal opens, press ESC, verify modal closes. | Playwright count 8 → 9, all pass | 1.5 hr |
-| A.9 | Cleanup verification gate: `vault check --strict` 0 errors • `vault lint` 0 warnings • `vault doctor` 0 fails • Playwright 9/9 • all 320 prior PASS items still in corpus | All five gates green | 15 min |
-| A.10 | Atomic commit: `cleanup(vault): registry repair + visual schema + lint calibration + zoom UI` | Pre-commit hook passes without `--no-verify` | 5 min |
-
-### A.6 expanded — expert-driven lint calibration
-
-| Step | Action | Acceptance |
-|---|---|---|
-| A.6.1 | Pull all 1,308 zone-level-affinity warns; group by (zone, level) pair; pick 3-5 representative questions per disputed pair as evidence | Manifest file `tools/lint_calibration_evidence.yaml` with ~30-50 disputed-pair samples |
-| A.6.2 | Spawn 4 expert agents in **parallel**: `expert-vijay-reddi`, `expert-chip-huyen`, `expert-jeff-dean`, `education-reviewer`. Each gets the same disputed-pair manifest + the question: *"for each (zone, level) pair, is it pedagogically valid? give your reasoning."* | 4 expert reports written to `.claude/_reviews/lint-calibration-<ts>/` |
-| A.6.3 | **(USER REVIEW CHECKPOINT 1)** — surface the four expert reports for user review before consolidation | User signs off |
-| A.6.4 | Consolidate via `consensus-builder` agent: every (zone, level) pair gets a verdict: `accepted` (≥3 experts say valid), `rejected` (≥3 say invalid), `disputed` (split) | Consensus report with verdict per pair |
-| A.6.5 | For `accepted` pairs → widen lint rule. For `rejected` pairs → reclassify the affected questions (update zone or level field, vault check still passes). For `disputed` pairs → ack-list with rationale. | Updated `zone_level_affinity.yaml` rule + reclassified items committed |
-| A.6.6 | Re-run `vault lint interviews/vault/questions/` → must report **0 warnings, 0 errors** | Strict pass |
-
---
-
-## Phase B — Full balanced generation (after A.10 lands; ~9-10 hr)
-
-The original Phase 1 analyzer flagged 100 cells. The first run hit
-~30 of those and PASS-ed at unusual cells (mobile/memory etc.) rather
-than the priority cells. Phase B systematically attacks the full
-list with prompts engineered for the actual content type needed.
-
-| ID | Task | Acceptance criterion | Effort |
-|---|---|---|---|
-| B.1 | Re-run analyzer against current corpus (post-cleanup): get fresh 100-cell recommended plan | Plan file written; top 20 inspected | 5 min |
-| B.2 | Cell-class triage: read the 100 cells, group by failure mode the first run revealed: `parallelism-too-shallow`, `global-L6+-too-abstract`, `healthy-fillable`. Each class gets its own prompt template. | `tools/cell_triage.md` written: list of cells × class × prompt-template ref | 1 hr |
-| B.3 | Author **3 specialized generator prompts**, one per failure class. **Parallelism** prompt: requires concrete topology (NVLink, IB, PCIe, RoCE, LoRa), forbids pure bandwidth division, requires synchronization or bubble cost in the question. **Global-L6+** prompt: requires cross-track synthesis (e.g., compare same constraint in tinyml + cloud), forbids generic abstractions. **Standard** prompt: refined version of current with validate-at-write fix. | 3 prompt files in `interviews/vault/scripts/prompts/`; test invocation against each produces 5 sample drafts that pass judge | 2 hr |
-| B.3' | **(USER REVIEW CHECKPOINT 2)** — surface prompt drafts for user review before B.5 | User signs off | — |
-| B.4 | Add validate-at-write to `gemini_cli_generate_questions.py`: every YAML round-trips through `Question.model_validate()` before write. Failures → retry once with "your previous output had X violations" prompt. Second failure → log structured error and skip. **This is the root-cause fix for the competency_area regression.** | Unit test: feed Gemini-style malformed dict → script rejects, retries, eventually skips with structured error | 1 hr |
-| B.5 | Two-stage loop: Stage 1 — 30-call run targeting all 100 cells with appropriate prompt class, batch_size 30 → ~900 drafts. Stage 2 — judge in chunks of 25; re-judge any NEEDS_FIX after one auto-fix retry pass. | Loop summary shows: drafts ≥ 800, PASS rate ≥ 60%, items in priority cells (parallelism + global L6+) ≥ 80 | 4-5 hr wall clock, 50-70 calls |
-| B.6 | Stratified spot-read: 20 items across (track × prompt-class × verdict). Reject drafts that read as bandwidth-math or "standard programming pattern." | Reviewed list saved; rejection rate ≤ 15% | 30 min |
-| B.7 | Promote PASS items, rebuild bundle, regen paper macros, recompile PDF | `vault check --strict` clean; corpus published count grows by 200-500; macros stamped | 30 min |
-
---
-
-## Phase C — NEEDS_FIX queue (parallel with B.5/B.6 once A.10 lands; ~2.5 hr)
-
-This run's 120 NEEDS_FIX items each carry a specific `fix_suggestion`
-from the judge (see `_validation_results/coverage_loop/20260425_150712/iter_*/judge_summary.json`).
-
-| ID | Task | Acceptance | Effort |
-|---|---|---|---|
-| C.1 | Aggregate the 120 NEEDS_FIX from this run + any new from Phase B into a single fix manifest with per-item `fix_suggestion` + criteria flags | Manifest file written, ≥120 entries | 15 min |
-| C.2 | Spawn `general-purpose` fix-agent with `quiz-generation.md` as quality bar; agent edits each YAML in place applying the judge's specific suggestion | Each YAML modified; `vault check --strict` still passes | 1.5 hr |
-| C.3 | Re-judge fixed items in a small chunked run (~3-5 calls) | Verdict distribution recorded | 30 min |
-| C.4 | Promote any items that flipped to PASS | Promoted count logged | 5 min |
-
-**Concurrency safety:** Phase C touches *existing* NEEDS_FIX YAMLs;
-Phase B writes *new* IDs. Different ID ranges → no write race. Both
-phases must NOT run while Phase A is in flight (schema/lint changes).
-
---
-
-## Phase D — Final stable state (after B + C; ~1 hr)
-
-| ID | Task | Acceptance | Effort |
-|---|---|---|---|
-| D.1 | Re-run all gates: `vault check --strict` • `vault lint` (0 warnings) • `vault doctor` (0 fails) • Playwright (9/9) • paper compile (0 LaTeX errors) • registry append-only invariant verified. Wrap as `tools/release_gate.sh`. | Single shell script returns exit 0 | 30 min |
-| D.2 | **(USER REVIEW CHECKPOINT 3)** — surface final state to user. | User signs off | — |
-| D.3 | Atomic final commit: `feat(vault): release-ready cleanup + balanced generation` | Pre-commit clean; branch ready for StaffML day | 10 min |
-
---
-
-## Common saturation outcomes (mirroring prior plan)
-
-If Phase B's loop stops early:
-
-| Reason | Meaning | What to do |
-|---|---|---|
-| `top priority gap < 0.8` | Corpus is balanced enough no cell desperately empty | Success. Move to B.6. |
-| `DROP rate > 35%` | Gemini hallucinating or cells nonsensical | Inspect latest iter `judge_summary.json`; add to `TRACK_TOPIC_BLOCKLIST` in `analyze_coverage_gaps.py`. Likely indicates a prompt template needs another revision. |
-| `same top cell two iters in a row` | Generator cannot fill the cell | Check raw Gemini output for that cell. Likely needs even more specialized prompt. **This is what fired in the prior run.** |
-| `max-iters reached` | Hit iteration cap before saturation | Re-run with higher `--max-iters 50` if budget allows. |
-| `max-calls reached` | Burned through API budget | Stop. Ship Phase C first. |
-
---
-
-## What NOT to do
-
- ❌ Don't merge to `dev` until Phase D passes (pre-commit hook + all gates green).
- ❌ Don't push to remote without explicit user OK.
- ❌ Don't run Phase B or C concurrent with Phase A in-flight.
- ❌ Don't add `Co-Authored-By` lines or automated attribution footers.
- ❌ Don't change schema enum values (CompetencyArea, Track, Level, Zone, Status, Provenance) without explicit user direction.
- ❌ Don't auto-promote NEEDS_FIX items without re-judge.
- ❌ Don't suppress lint warnings or skip pre-commit hooks (`--no-verify` forbidden).
- ❌ Don't relitigate the locked decisions above without explicit user direction.
- ❌ Don't navigate to or modify files in sibling worktrees (`MLSysBook`, `MLSysBook-vault-audit`, `MLSysBook-404`, `MLSysBook-labs-release`). Stay in `MLSysBook-massive-build`.
- ❌ Don't auto-cut a release tag (`v0.1.2` etc.) — single stable commit is the goal, not a release ceremony.
-
---
-
-## Files of interest
-
-| File | Why |
-|---|---|
-| `interviews/vault/docs/RESUME_PLAN_2026-04-25.md` | Prior session's plan (completed through Phase 7). |
-| `interviews/vault/docs/MASSIVE_BUILD_RUNBOOK.md` | Methodology document — the prior session's runbook. |
-| `interviews/vault/_validation_results/coverage_loop/20260425_150712/` | Last loop's per-iter judge_summary.json (PASS/NEEDS_FIX/DROP details with fix_suggestion). |
-| `interviews/vault/scripts/iterate_coverage_loop.py` | Main driver. Defaults bumped this session. |
-| `interviews/vault/scripts/analyze_coverage_gaps.py` | Priority ranking. |
-| `interviews/vault/scripts/gemini_cli_generate_questions.py` | Batched Gemini generation. **Phase B.4 adds validate-at-write here.** |
-| `interviews/vault/scripts/gemini_cli_llm_judge.py` | Multi-criteria validator. |
-| `interviews/vault/scripts/render_visuals.py` | DOT/matplotlib → SVG. **Phase A.2 fixes silent-failure mode here.** |
-| `interviews/vault/scripts/fix_competency_areas.py` | One-time cleanup. REMAP table extended this session. |
-| `interviews/vault/scripts/promote_validated.py` | Lifecycle flip. |
-| `interviews/vault-cli/src/vault_cli/commands/doctor.py` | **Phase A.5 splits `_check_registry_integrity` into two checks.** |
-| `interviews/vault-cli/src/vault_cli/commands/lint.py` | **Phase A.6 updates `zone_level_affinity` rule.** |
-| `interviews/vault/id-registry.yaml` | Append-only ID log. **Phase A.5 appends 5,269 missing IDs.** |
-| `interviews/staffml/src/data/vault-manifest.json` | GUI's authoritative count. Refresh after every bundle build. |
-| `.claude/agents/expert-*.md` | Expert agent definitions for A.6.2. |
-| `.claude/agents/consensus-builder.md` | Consensus aggregator for A.6.4. |
-
---
-
-## One-liner status check (run first in next session)
-
-```bash
-cd /Users/VJ/GitHub/MLSysBook-massive-build && \
-  git log --oneline -3 && echo "---" && \
-  git status --short | head -10 && echo "---" && \
-  PYTHONPATH=interviews/vault-cli/src \
-    python3 -m vault_cli.main check --strict 2>&1 | tail -3 && \
-  echo "---" && \
-  PYTHONPATH=interviews/vault-cli/src \
-    python3 -m vault_cli.main lint interviews/vault/questions/ 2>&1 | tail -3 && \
-  echo "---" && \
-  PYTHONPATH=interviews/vault-cli/src \
-    python3 -m vault_cli.main doctor 2>&1 | tail -10 && \
-  echo "---" && \
-  python3 -c "
-import json
-c = json.load(open('interviews/staffml/src/data/corpus.json'))
-print(f'published: {len(c)}')
-"
-```
-
-If the output shows commit `ece6eccf2`, clean tree, `vault check`
-passes, lint reports 1,308 warnings, doctor shows registry fail
-(5,269/4,479) — the resume state is healthy and matches this plan's
-starting assumptions. **Proceed to Phase A.1.**
-
-If something differs, **stop and reconcile** before starting work.
-
---
-
-## Pacing
-
-This is a ~17-19 hour push, plausibly 2 focused days or 3-4 calendar
-days with breaks. The work is heavy on prompt engineering (B.3) and
-data-cleanup (A.6, A.7). Don't rush; the gates are the contract.
-
-Three explicit user-review checkpoints (A.6.3, B.3', D.2). Wait for
-sign-off at each before continuing.