mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-06 17:49:07 -05:00
refactor(staffml): retire prod static-fallback; opt-in dev-only (#1598)
The bundled corpus.json was serving as a prod safety net behind the
Cloudflare Worker. Post-cutover the Worker has been the real data
source, and the static path was silently degrading rather than helping
(corpus.json is a generated artifact whose prose `details` are blank
in corpus-summary.json). This change:
- Stops emitting corpus.json in the publish-live workflow
- Removes the Worker-error fallback in getQuestionFullDetail — errors
now propagate to useFullQuestion and the UI shows a "details
unavailable" banner instead of silently filling blanks
- Drops the localhost auto-trigger in shouldUseStaticDetails — the
static path now requires explicit NEXT_PUBLIC_VAULT_FALLBACK=static
- Switches taxonomy.ts to corpus-summary.json (was corpus.json)
- Rewrites the publish-live smoke tests against corpus-summary.json
- Collapses validate-vault.py to sparse-only (per-question deep
validation lives in `vault check --strict`)
Static-fallback remains as an OPT-IN local-dev affordance: set
NEXT_PUBLIC_VAULT_FALLBACK=static and run `vault build --legacy-json`
to materialize corpus.json. The Function-constructor dynamic import
keeps Turbopack from requiring corpus.json at build time.
useFullQuestion hook signature changed from `Question | undefined` to
`{ question, status }`. Callers updated: practice and plans pages
(both render an amber "details unavailable" banner when status
is 'error').
Deleted dead cutover scaffolding: corpus-source.ts (router with no UI
consumers), corpus-vault.ts (worker-only mirror, never wired up),
useVaultQuestion.ts (unused migration hook), vault-fallback.ts (only
consumer was corpus-source.ts).
Deleted stale docs: staffml/scripts/DEPRECATED.md, vault-cli/docs/
CUTOVER_QA.md, three vault/docs/RESUME_PLAN_*.md.
Verified locally: tsc clean, vitest 37/37, next build produces all
15 static routes.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
committed by
GitHub
parent
6e4ab3f779
commit
c824ac6ed1
26
.github/workflows/staffml-publish-live.yml
vendored
26
.github/workflows/staffml-publish-live.yml
vendored
@@ -88,13 +88,6 @@ jobs:
|
||||
- name: 🛠️ Install vault-cli
|
||||
run: pip install -e interviews/vault-cli/
|
||||
|
||||
- name: 🔄 Regenerate corpus from YAMLs (vault build --legacy-json)
|
||||
# Ships interviews/staffml/src/data/corpus.json (full) +
|
||||
# corpus-summary.json (bundled by the site) straight from the
|
||||
# committed YAMLs. The site always deploys with current YAML state,
|
||||
# even if the committed JSON artifacts drifted on the last commit.
|
||||
run: vault build --vault-dir interviews/vault --release-id publish-live --legacy-json
|
||||
|
||||
- name: 🔍 Type check
|
||||
working-directory: interviews/staffml
|
||||
run: npx tsc --noEmit
|
||||
@@ -115,9 +108,9 @@ jobs:
|
||||
# required on /ask calls from the AskInterviewer panel.
|
||||
NEXT_PUBLIC_INTERVIEWER_ENDPOINT: https://mlsysbook.ai/api/staffml-interviewer
|
||||
# Vault cutover (PRs #1433, #1434): site reads live data from the
|
||||
# Cloudflare Worker in production; bundled corpus.json is the
|
||||
# offline fallback. NEXT_PUBLIC_VAULT_FALLBACK is intentionally
|
||||
# unset so vault-fallback.ts defaults to 'vault-api'.
|
||||
# Cloudflare Worker in production. There is no static rollback in
|
||||
# prod — corpus.json is neither emitted nor bundled. If the Worker
|
||||
# is unreachable, the UI surfaces a "details unavailable" banner.
|
||||
NEXT_PUBLIC_VAULT_API: https://staffml-vault.mlsysbook-ai-account.workers.dev
|
||||
NEXT_PUBLIC_VAULT_RELEASE: "1.0.2"
|
||||
run: npm run build
|
||||
@@ -164,20 +157,25 @@ jobs:
|
||||
run: python3 interviews/staffml/scripts/validate-vault.py
|
||||
|
||||
- name: 🧪 Smoke tests
|
||||
# Reads the bundled corpus-summary.json (committed) — the prod build
|
||||
# ships this as the synchronous catalog. Heavy fields (scenario,
|
||||
# details prose) live on the Worker and are not re-validated here;
|
||||
# `vault check --strict` covers per-question YAML validation in the
|
||||
# validate-vault workflow.
|
||||
run: |
|
||||
python3 -c "
|
||||
import json, sys, os
|
||||
import json
|
||||
|
||||
with open('interviews/staffml/src/data/corpus.json') as f:
|
||||
with open('interviews/staffml/src/data/corpus-summary.json') as f:
|
||||
corpus = json.load(f)
|
||||
assert len(corpus) >= 4000, f'Corpus too small: {len(corpus)}'
|
||||
print(f'✅ Corpus: {len(corpus)} questions')
|
||||
|
||||
required = ['id', 'title', 'level', 'track', 'scenario', 'competency_area', 'topic', 'zone', 'details']
|
||||
required = ['id', 'title', 'level', 'track', 'competency_area', 'topic', 'zone']
|
||||
for q in corpus:
|
||||
for f in required:
|
||||
assert q.get(f), f'{q.get(\"id\", \"???\")} missing {f}'
|
||||
print('✅ All questions have required fields')
|
||||
print('✅ All questions have required structural fields')
|
||||
|
||||
valid_levels = {'L1', 'L2', 'L3', 'L4', 'L5', 'L6', 'L6+'}
|
||||
for q in corpus:
|
||||
|
||||
@@ -17,9 +17,16 @@ NEXT_PUBLIC_VAULT_API=https://staffml-vault.mlsysbook-ai-account.workers.dev
|
||||
# mismatch surfaces in X-Vault-Release SLI but still serves.
|
||||
NEXT_PUBLIC_VAULT_RELEASE=1.0.2
|
||||
|
||||
# Data-source switch:
|
||||
# unset or 'vault' → worker-primary, bundled corpus.json as fallback (DEFAULT)
|
||||
# 'static' → bundled-only (rollback / offline / worker-unreachable dev)
|
||||
# The bundled corpus.json is preserved on disk as a safety net — it is not
|
||||
# deleted, but the site reads from the worker when it's reachable.
|
||||
# OPT-IN offline dev mode (local-only — production never sets this):
|
||||
# unset (DEFAULT) → site reads details from the Worker. If the Worker is
|
||||
# unreachable, detail prose is omitted and the UI shows
|
||||
# a "details unavailable" banner.
|
||||
# 'static' → site reads details from a bundled corpus.json instead
|
||||
# of the Worker. Requires materializing corpus.json
|
||||
# locally first:
|
||||
# vault build --vault-dir interviews/vault \
|
||||
# --release-id local-dev --legacy-json
|
||||
# Use this when working offline or against an unreachable
|
||||
# Worker. Production deploys neither emit nor bundle
|
||||
# corpus.json — there is no static rollback path in prod.
|
||||
# NEXT_PUBLIC_VAULT_FALLBACK=static
|
||||
|
||||
@@ -1,36 +0,0 @@
|
||||
# Deprecated scripts — `interviews/staffml/scripts/`
|
||||
|
||||
These pre-date the YAML migration (ARCHITECTURE.md v2.x, Phase 1). They ran
|
||||
against the monolithic `interviews/vault/corpus.json` (now a generated
|
||||
artifact) or pushed data into `src/data/corpus.json` (now emitted by
|
||||
`vault build --legacy-json`).
|
||||
|
||||
## Replaced-by map
|
||||
|
||||
| Legacy script | Purpose | Replacement |
|
||||
|---|---|---|
|
||||
| `sync-vault.py` | Copied vault/corpus.json → src/data/ with filter | `vault build --legacy-json` emits site-compatible JSON directly |
|
||||
| `generate-manifest.py` | Built src/data/vault-manifest.json | Built by `vault publish` as a release artifact |
|
||||
| `validate-vault.py` | Sanity check on corpus shape | Covered by `vault check --strict` invariants |
|
||||
| `format-napkin-math.py` | One-shot formatter | Obsolete |
|
||||
| `sync-periodic-table.mjs` | Unrelated (periodic-table site feature) | Still active — NOT deprecated |
|
||||
|
||||
## Current flow
|
||||
|
||||
```bash
|
||||
vault build --legacy-json # from repo root
|
||||
# Regenerates:
|
||||
# interviews/staffml/src/data/corpus.json (9199 questions, site-compatible shape)
|
||||
# interviews/vault/vault.db (25 MB SQLite build artifact)
|
||||
# Verifies release_hash against corpus-equivalence-hash.txt
|
||||
```
|
||||
|
||||
The site layout has NOT changed: `corpus.ts` still does
|
||||
`import corpusData from '../data/corpus.json'`. The only difference is that
|
||||
`corpus.json` is now derived from YAML rather than hand-edited — a
|
||||
pre-commit hook refuses direct edits to it.
|
||||
|
||||
Phase-4 cutover replaces the bundled JSON with Worker-API reads via
|
||||
`corpus-source.ts` + `vault-api.ts`. That's a separate step;
|
||||
`corpus.json` stays through at least 2 post-cutover releases as the
|
||||
rollback fallback.
|
||||
@@ -1,15 +1,16 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Validate vault data integrity for StaffML deployment.
|
||||
"""Sparse vault sanity check for the StaffML deploy.
|
||||
|
||||
When ``corpus.json`` is present (e.g. after ``vault build --legacy-json``), runs
|
||||
full cross-checks against taxonomy and manifest.
|
||||
Validates the small committed metadata files that ship in the repo:
|
||||
``taxonomy.json`` and ``vault-manifest.json``. Confirms taxonomy has
|
||||
concepts, manifest has a question count, and track distributions add up.
|
||||
|
||||
When ``corpus.json`` is absent — the normal case for a clean clone after
|
||||
2026-04-26, when corpus was retired as a tracked file — runs **sparse** checks
|
||||
only: committed ``taxonomy.json`` and ``vault-manifest.json`` must load and
|
||||
look self-consistent. Full per-question validation is expected from
|
||||
``vault check --strict`` in CI (``staffml-validate-vault.yml``) and from this
|
||||
script after a local or CI ``vault build -- ... --legacy-json``.
|
||||
Per-question deep validation (schema, chain integrity, math, etc.) is
|
||||
covered by ``vault check --strict`` (run in CI via
|
||||
``staffml-validate-vault.yml``), which validates directly against the
|
||||
YAML source files in ``interviews/vault/`` rather than a generated JSON
|
||||
artifact. This script is the cheap pre-deploy gate; ``vault check`` is
|
||||
the authoritative one.
|
||||
|
||||
Exit code 0 = all checks pass, 1 = errors found.
|
||||
|
||||
@@ -18,7 +19,6 @@ Usage: python3 interviews/staffml/scripts/validate-vault.py
|
||||
|
||||
import json
|
||||
import sys
|
||||
from collections import Counter
|
||||
from pathlib import Path
|
||||
|
||||
STAFFML_DATA = Path(__file__).parent.parent / "src" / "data"
|
||||
@@ -41,12 +41,14 @@ def ok(msg: str) -> None:
|
||||
print(f" ✅ {msg}")
|
||||
|
||||
|
||||
def run_sparse_validation(taxonomy_path: Path, manifest_path: Path) -> int:
|
||||
"""Validate committed JSON when the full bundled corpus is not on disk."""
|
||||
print("\n🔍 Sparse mode (no corpus.json)")
|
||||
def main() -> int:
|
||||
taxonomy_path = STAFFML_DATA / "taxonomy.json"
|
||||
manifest_path = STAFFML_DATA / "vault-manifest.json"
|
||||
|
||||
print("\n🔍 Sparse vault check (committed metadata only)")
|
||||
print(
|
||||
" Per-question checks require a build artifact. Regenerate with:\n"
|
||||
" vault build --vault-dir interviews/vault --release-id <id> --legacy-json\n"
|
||||
" Per-question deep validation lives in `vault check --strict` "
|
||||
"(staffml-validate-vault.yml).\n"
|
||||
)
|
||||
|
||||
if not taxonomy_path.exists():
|
||||
@@ -88,7 +90,6 @@ def run_sparse_validation(taxonomy_path: Path, manifest_path: Path) -> int:
|
||||
ok(f"Vault v{ver} — hash {h}")
|
||||
|
||||
print(f"\n{'=' * 50}")
|
||||
print(f" Mode: sparse (no corpus.json)")
|
||||
print(f" Errors: {len(errors)}")
|
||||
print(f" Warnings: {len(warnings)}")
|
||||
print(f"{'=' * 50}")
|
||||
@@ -97,213 +98,13 @@ def run_sparse_validation(taxonomy_path: Path, manifest_path: Path) -> int:
|
||||
print("\n❌ Sparse validation failed")
|
||||
return 1
|
||||
print(
|
||||
"\n🎯 Sparse checks passed — for full deploy-grade validation, run vault build "
|
||||
"--legacy-json and re-run this script, or rely on staffml-validate-vault (CI)."
|
||||
"\n🎯 Sparse checks passed — for deep per-question validation run "
|
||||
"`vault check --strict` (or rely on staffml-validate-vault in CI)."
|
||||
)
|
||||
if warnings:
|
||||
print(f" ({len(warnings)} warnings — review recommended)")
|
||||
return 0
|
||||
|
||||
|
||||
# ── 1. Load data ─────────────────────────────────────────────
|
||||
|
||||
corpus_path = STAFFML_DATA / "corpus.json"
|
||||
taxonomy_path = STAFFML_DATA / "taxonomy.json"
|
||||
manifest_path = STAFFML_DATA / "vault-manifest.json"
|
||||
|
||||
if not corpus_path.exists():
|
||||
sys.exit(run_sparse_validation(taxonomy_path, manifest_path))
|
||||
|
||||
if not taxonomy_path.exists():
|
||||
print(f" ❌ taxonomy.json not found at {taxonomy_path}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print("\n🔍 Loading data files...")
|
||||
|
||||
with open(corpus_path, encoding="utf-8") as f:
|
||||
corpus = json.load(f)
|
||||
with open(taxonomy_path, encoding="utf-8") as f:
|
||||
taxonomy = json.load(f)
|
||||
|
||||
manifest = None
|
||||
if manifest_path.exists():
|
||||
with open(manifest_path, encoding="utf-8") as f:
|
||||
manifest = json.load(f)
|
||||
|
||||
ok(f"Loaded {len(corpus)} questions, {len(taxonomy.get('concepts', []))} concepts")
|
||||
|
||||
# ── 2. Schema checks ─────────────────────────────────────────
|
||||
|
||||
print("\n📋 Schema validation...")
|
||||
|
||||
REQUIRED_FIELDS = [
|
||||
"id",
|
||||
"title",
|
||||
"level",
|
||||
"track",
|
||||
"scenario",
|
||||
"competency_area",
|
||||
"details",
|
||||
]
|
||||
VALID_LEVELS = {"L1", "L2", "L3", "L4", "L5", "L6", "L6+"}
|
||||
VALID_TRACKS = {"cloud", "edge", "mobile", "tinyml", "global"}
|
||||
DETAIL_FIELDS = ["common_mistake", "realistic_solution"]
|
||||
|
||||
missing_fields = 0
|
||||
bad_levels = 0
|
||||
bad_tracks = 0
|
||||
short_scenarios = 0
|
||||
empty_answers = 0
|
||||
|
||||
for q in corpus:
|
||||
qid = q.get("id", "???")
|
||||
|
||||
for field in REQUIRED_FIELDS:
|
||||
if not q.get(field):
|
||||
error(f"{qid}: missing required field '{field}'")
|
||||
missing_fields += 1
|
||||
|
||||
if q.get("level") not in VALID_LEVELS:
|
||||
error(f"{qid}: invalid level '{q.get('level')}'")
|
||||
bad_levels += 1
|
||||
|
||||
if q.get("track") not in VALID_TRACKS:
|
||||
error(f"{qid}: invalid track '{q.get('track')}'")
|
||||
bad_tracks += 1
|
||||
|
||||
scenario = q.get("scenario", "")
|
||||
if len(scenario.strip()) < 30:
|
||||
warn(f"{qid}: scenario too short ({len(scenario)} chars)")
|
||||
short_scenarios += 1
|
||||
|
||||
details = q.get("details", {})
|
||||
for df in DETAIL_FIELDS:
|
||||
if not details.get(df) or len(str(details.get(df, "")).strip()) < 5:
|
||||
warn(f"{qid}: details.{df} empty or too short")
|
||||
empty_answers += 1
|
||||
|
||||
if missing_fields == 0 and bad_levels == 0 and bad_tracks == 0:
|
||||
ok("All questions have valid required fields, levels, and tracks")
|
||||
else:
|
||||
error(
|
||||
f"{missing_fields} missing fields, {bad_levels} bad levels, {bad_tracks} bad tracks"
|
||||
)
|
||||
|
||||
# ── 3. Uniqueness checks ─────────────────────────────────────
|
||||
|
||||
print("\n🔑 Uniqueness checks...")
|
||||
|
||||
ids = [q["id"] for q in corpus]
|
||||
id_counts = Counter(ids)
|
||||
dupes = {k: v for k, v in id_counts.items() if v > 1}
|
||||
if dupes:
|
||||
error(f"{len(dupes)} duplicate IDs: {list(dupes.keys())[:5]}")
|
||||
else:
|
||||
ok(f"All {len(ids)} question IDs are unique")
|
||||
|
||||
# ── 4. Taxonomy consistency ──────────────────────────────────
|
||||
|
||||
print("\n🏷️ Taxonomy consistency...")
|
||||
|
||||
concepts = {c["id"] for c in taxonomy.get("concepts", [])}
|
||||
corpus_concepts = {q.get("taxonomy_concept") for q in corpus if q.get("taxonomy_concept")}
|
||||
unmapped = corpus_concepts - concepts
|
||||
|
||||
if unmapped:
|
||||
warn(f"{len(unmapped)} corpus concepts not in taxonomy: {list(unmapped)[:5]}")
|
||||
else:
|
||||
ok(f"All {len(corpus_concepts)} corpus concepts exist in taxonomy")
|
||||
|
||||
corpus_areas = Counter(q.get("competency_area", "???") for q in corpus)
|
||||
ok(f"{len(corpus_areas)} competency areas in use")
|
||||
|
||||
# ── 5. Chain integrity ───────────────────────────────────────
|
||||
|
||||
print("\n🔗 Chain integrity...")
|
||||
|
||||
chains: dict[str, list] = {}
|
||||
for q in corpus:
|
||||
cids = q.get("chain_ids", "")
|
||||
if isinstance(cids, list):
|
||||
for cid in cids:
|
||||
if cid:
|
||||
chains.setdefault(cid, []).append(q)
|
||||
elif cids:
|
||||
chains.setdefault(cids, []).append(q)
|
||||
|
||||
solo_chains = sum(1 for c in chains.values() if len(c) <= 1)
|
||||
if solo_chains > 0:
|
||||
warn(f"{solo_chains} single-question chains (should be 2+)")
|
||||
|
||||
duplicate_chains = 0
|
||||
for cid, qs in chains.items():
|
||||
pos_list = []
|
||||
for q in qs:
|
||||
cp = q.get("chain_positions", -1)
|
||||
if isinstance(cp, dict):
|
||||
pos_list.append(int(cp.get(cid, -1)))
|
||||
else:
|
||||
pos_list.append(int(cp) if cp != "" else -1)
|
||||
if len(pos_list) != len(set(pos_list)):
|
||||
duplicate_chains += 1
|
||||
if duplicate_chains <= 3:
|
||||
warn(f"Chain '{cid}': duplicate positions {sorted(pos_list)}")
|
||||
|
||||
if duplicate_chains == 0:
|
||||
ok(f"All {len(chains)} chains have unique positions")
|
||||
else:
|
||||
warn(f"{duplicate_chains} chains have duplicate positions")
|
||||
|
||||
# ── 6. Manifest consistency ──────────────────────────────────
|
||||
|
||||
print("\n📦 Manifest consistency...")
|
||||
|
||||
if manifest:
|
||||
if manifest.get("questionCount") != len(corpus):
|
||||
error(
|
||||
f"Manifest says {manifest['questionCount']} questions, corpus has {len(corpus)}"
|
||||
)
|
||||
else:
|
||||
ok(f"Manifest matches corpus: {len(corpus)} questions")
|
||||
|
||||
if manifest.get("chainCount") != len(chains):
|
||||
warn(
|
||||
f"Manifest says {manifest['chainCount']} chains, found {len(chains)}"
|
||||
)
|
||||
|
||||
ok(f"Vault v{manifest.get('version', '?')} — hash {manifest.get('contentHash', '?')}")
|
||||
else:
|
||||
warn("No vault-manifest.json found — run vault build --legacy-json")
|
||||
|
||||
# ── 7. Distribution sanity ───────────────────────────────────
|
||||
|
||||
print("\n📊 Distribution sanity...")
|
||||
|
||||
level_dist = Counter(q.get("level") for q in corpus)
|
||||
track_dist = Counter(q.get("track") for q in corpus)
|
||||
|
||||
for track, count in track_dist.items():
|
||||
pct = count / len(corpus) * 100
|
||||
if pct < 2:
|
||||
warn(f"Track '{track}' has only {count} questions ({pct:.1f}%)")
|
||||
|
||||
ok(f"Levels: {dict(sorted(level_dist.items()))}")
|
||||
ok(f"Tracks: {dict(sorted(track_dist.items()))}")
|
||||
|
||||
# ── Summary ──────────────────────────────────────────────────
|
||||
|
||||
print(f"\n{'=' * 50}")
|
||||
print(f" Questions: {len(corpus)}")
|
||||
print(f" Chains: {len(chains)}")
|
||||
print(f" Concepts: {len(concepts)}")
|
||||
print(f" Errors: {len(errors)}")
|
||||
print(f" Warnings: {len(warnings)}")
|
||||
print(f"{'=' * 50}")
|
||||
|
||||
if errors:
|
||||
print(f"\n❌ {len(errors)} errors found — vault is NOT deployment-ready")
|
||||
sys.exit(1)
|
||||
print("\n🎯 All checks passed — vault is deployment-ready")
|
||||
if warnings:
|
||||
print(f" ({len(warnings)} warnings — review recommended)")
|
||||
sys.exit(0)
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -54,7 +54,8 @@ export default function PlansPage() {
|
||||
};
|
||||
|
||||
const currentSummary = questions[currentIdx];
|
||||
const current = useFullQuestion(currentSummary) ?? currentSummary;
|
||||
const { question: hydrated, status: hydrationStatus } = useFullQuestion(currentSummary);
|
||||
const current = hydrated ?? currentSummary;
|
||||
const maxScore = napkinResult?.maxSelfScore ?? 3;
|
||||
|
||||
const handleReveal = () => {
|
||||
@@ -217,6 +218,11 @@ export default function PlansPage() {
|
||||
<span className="text-[10px] font-mono text-textTertiary">{current.track} / {current.level}</span>
|
||||
</div>
|
||||
<h2 className="text-2xl lg:text-3xl font-bold text-textPrimary mb-6 tracking-tight">{current.title}</h2>
|
||||
{hydrationStatus === "error" && (
|
||||
<div className="mb-4 rounded-md border border-amber-300 bg-amber-50 px-3 py-2 text-sm text-amber-800">
|
||||
Could not load the full question details. Reload to retry.
|
||||
</div>
|
||||
)}
|
||||
<div className="prose max-w-none">
|
||||
{current.scenario ? (
|
||||
<p className="text-textSecondary leading-relaxed text-base">{cleanScenario(current.scenario)}</p>
|
||||
|
||||
@@ -183,7 +183,8 @@ function PracticePage() {
|
||||
// (no scenario/details). `current` is hydrated from the worker via
|
||||
// useFullQuestion — same shape, but scenario + details populated.
|
||||
const [currentSummary, setCurrentSummary] = useState<Question | null>(null);
|
||||
const current = useFullQuestion(currentSummary) ?? currentSummary;
|
||||
const { question: hydrated, status: hydrationStatus } = useFullQuestion(currentSummary);
|
||||
const current = hydrated ?? currentSummary;
|
||||
const setCurrent = setCurrentSummary;
|
||||
const skipFilterCount = useRef(0);
|
||||
const questionShownAt = useRef(Date.now());
|
||||
@@ -1056,6 +1057,14 @@ function PracticePage() {
|
||||
{current.title}
|
||||
</h2>
|
||||
|
||||
{hydrationStatus === "error" && (
|
||||
<div className="mb-4 rounded-md border border-amber-300 bg-amber-50 px-3 py-2 text-sm text-amber-800">
|
||||
Could not load the full question details. The
|
||||
question prompt is shown, but scenario and answer
|
||||
notes are unavailable. Reload to retry.
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/*
|
||||
STICKY Your-task callout. Pins to the top of the
|
||||
scroll container so the question stays visible
|
||||
|
||||
@@ -1,20 +1,23 @@
|
||||
"use client";
|
||||
|
||||
/**
|
||||
* CorpusProvider — Phase-4 hybrid data layer.
|
||||
* CorpusProvider — hybrid data layer.
|
||||
*
|
||||
* The bundled corpus.json remains the primary data source for synchronous
|
||||
* operations (getQuestions, getQuestionsByFilter, etc.). The Worker API
|
||||
* enhances two specific operations:
|
||||
* The bundled `corpus-summary.json` is the primary data source for
|
||||
* synchronous operations (getQuestions, getQuestionsByFilter, taxonomy,
|
||||
* navigation). Heavy fields (scenario, details prose) come from the
|
||||
* Cloudflare Worker via vault-api.ts.
|
||||
*
|
||||
* The Worker enhances two specific operations:
|
||||
*
|
||||
* 1. **Search** — FTS5 full-text search via /search endpoint replaces the
|
||||
* client-side O(n) string matching.
|
||||
* 2. **Service worker registration** — enables offline caching of API
|
||||
* responses for future full-API cutover.
|
||||
* responses for the per-question detail fetches.
|
||||
*
|
||||
* When NEXT_PUBLIC_VAULT_API is set and NEXT_PUBLIC_VAULT_FALLBACK is not
|
||||
* "static", the provider registers the service worker and exposes the
|
||||
* vault-enhanced search. Otherwise everything falls back silently.
|
||||
* NEXT_PUBLIC_VAULT_FALLBACK=static is an OPT-IN local-dev affordance for
|
||||
* working without a reachable Worker (requires `vault build --legacy-json`
|
||||
* to materialize corpus.json). Production never sets it.
|
||||
*/
|
||||
|
||||
import { createContext, useContext, useEffect, useState, useCallback, type ReactNode } from "react";
|
||||
|
||||
@@ -1,56 +0,0 @@
|
||||
/**
|
||||
* Corpus data-source switch (Phase-4 cutover router).
|
||||
*
|
||||
* Components that want to be cutover-aware import from this module instead of
|
||||
* ``corpus.ts``. Returns the vault-API-backed path when
|
||||
* ``NEXT_PUBLIC_VAULT_FALLBACK`` is NOT 'static', falls back to the bundled
|
||||
* path otherwise.
|
||||
*
|
||||
* Components untouched by the cutover continue importing ``corpus.ts`` directly
|
||||
* (unchanged behavior) until the user is ready to flip them. This keeps the
|
||||
* Phase-4 cutover reviewable one component at a time.
|
||||
*/
|
||||
|
||||
import { usingFallback } from "./vault-fallback";
|
||||
import * as legacy from "./corpus";
|
||||
import * as vault from "./corpus-vault";
|
||||
|
||||
export function getCorpusSource(): "static" | "vault-api" {
|
||||
return usingFallback() ? "static" : "vault-api";
|
||||
}
|
||||
|
||||
export async function getQuestionById(id: string): Promise<unknown | null> {
|
||||
if (usingFallback()) {
|
||||
const qs = legacy.getQuestions();
|
||||
return qs.find(q => q.id === id) ?? null;
|
||||
}
|
||||
return vault.getQuestionById(id);
|
||||
}
|
||||
|
||||
export async function listQuestions(
|
||||
params: { track?: string; level?: string; zone?: string; limit?: number } = {},
|
||||
): Promise<unknown[]> {
|
||||
if (usingFallback()) {
|
||||
let qs = legacy.getQuestions() as any[];
|
||||
if (params.track) qs = qs.filter(q => q.track === params.track);
|
||||
if (params.level) qs = qs.filter(q => q.level === params.level);
|
||||
if (params.zone) qs = qs.filter(q => q.zone === params.zone);
|
||||
if (params.limit) qs = qs.slice(0, params.limit);
|
||||
return qs;
|
||||
}
|
||||
return vault.listQuestions(params);
|
||||
}
|
||||
|
||||
export async function searchQuestions(q: string, limit = 20): Promise<unknown[]> {
|
||||
if (usingFallback()) {
|
||||
const qs = legacy.getQuestions() as any[];
|
||||
const needle = q.toLowerCase();
|
||||
return qs
|
||||
.filter(item =>
|
||||
(item.title ?? "").toLowerCase().includes(needle)
|
||||
|| (item.scenario ?? "").toLowerCase().includes(needle)
|
||||
)
|
||||
.slice(0, limit);
|
||||
}
|
||||
return vault.searchQuestions(q, limit);
|
||||
}
|
||||
@@ -1,161 +0,0 @@
|
||||
/**
|
||||
* Vault-API-backed corpus data source.
|
||||
*
|
||||
* Mirror of the public surface of ``corpus.ts`` but sourced from the
|
||||
* staffml-vault Worker via ``vault-api.ts`` instead of the bundled
|
||||
* ``corpus.json``. Not wired into any component until cutover — the
|
||||
* switch happens via ``corpus-source.ts``.
|
||||
*
|
||||
* Post-v1.0 (2026-04-21): the vault schema now carries track/level/zone
|
||||
* as YAML fields and uses plural `chains: [{id, position}]`, so this
|
||||
* adapter's job shrinks considerably. The defaulting to
|
||||
* `track='global'`/`level='l1'`/`zone='recall'` that existed here was
|
||||
* exactly the silent-mis-classification pattern that hid the v0.1
|
||||
* migration bug; those defaults are gone.
|
||||
*/
|
||||
|
||||
import type { Question as VaultQuestion } from "@staffml/vault-types";
|
||||
import { makeClientFromEnv, VaultApiClient } from "./vault-api";
|
||||
|
||||
// v1.0: classification lives on the Question itself.
|
||||
type EnrichedVaultQuestion = VaultQuestion & {
|
||||
track: string;
|
||||
level: string;
|
||||
zone: string;
|
||||
competency_area: string;
|
||||
bloom_level?: string;
|
||||
phase?: string;
|
||||
question?: string;
|
||||
visual?: {
|
||||
kind: "svg"; // closed enum as of v0.1.2 (mermaid retired)
|
||||
path: string;
|
||||
alt: string; // ≥10 chars (a11y)
|
||||
caption: string; // required as of v0.1.2, ≥5 chars
|
||||
};
|
||||
chains?: Array<{ id: string; position: number }>;
|
||||
validated?: boolean;
|
||||
math_verified?: boolean;
|
||||
human_reviewed?: {
|
||||
status: string;
|
||||
by?: string | null;
|
||||
date?: string | null;
|
||||
};
|
||||
};
|
||||
|
||||
// Shape the UI already expects (see corpus.ts).
|
||||
export interface Question {
|
||||
id: string;
|
||||
track: string;
|
||||
level: string;
|
||||
title: string;
|
||||
topic: string;
|
||||
zone: string;
|
||||
competency_area: string;
|
||||
bloom_level?: string;
|
||||
phase?: string;
|
||||
scenario: string;
|
||||
question?: string;
|
||||
visual?: {
|
||||
kind: "svg"; // closed enum as of v0.1.2 (mermaid retired)
|
||||
path: string;
|
||||
alt: string; // ≥10 chars (a11y)
|
||||
caption: string; // required as of v0.1.2, ≥5 chars
|
||||
};
|
||||
chain_ids?: string[];
|
||||
chain_positions?: Record<string, number>;
|
||||
details: {
|
||||
common_mistake: string;
|
||||
realistic_solution: string;
|
||||
napkin_math?: string;
|
||||
};
|
||||
validated?: boolean;
|
||||
math_verified?: boolean;
|
||||
human_reviewed?: {
|
||||
status: string;
|
||||
by?: string | null;
|
||||
date?: string | null;
|
||||
};
|
||||
}
|
||||
|
||||
function adapt(v: EnrichedVaultQuestion): Question {
|
||||
// Rebuild legacy chain_ids + chain_positions from the plural `chains` list.
|
||||
const chainIds: string[] = [];
|
||||
const chainPositions: Record<string, number> = {};
|
||||
for (const c of v.chains ?? []) {
|
||||
chainIds.push(c.id);
|
||||
chainPositions[c.id] = c.position;
|
||||
}
|
||||
return {
|
||||
id: v.id,
|
||||
track: v.track,
|
||||
level: v.level,
|
||||
title: v.title,
|
||||
topic: v.topic,
|
||||
zone: v.zone,
|
||||
competency_area: v.competency_area,
|
||||
bloom_level: v.bloom_level,
|
||||
phase: v.phase,
|
||||
scenario: v.scenario,
|
||||
question: v.question,
|
||||
visual: v.visual,
|
||||
chain_ids: chainIds.length ? chainIds : undefined,
|
||||
chain_positions: chainIds.length ? chainPositions : undefined,
|
||||
details: {
|
||||
common_mistake: v.details.common_mistake ?? "",
|
||||
realistic_solution: v.details.realistic_solution,
|
||||
napkin_math: v.details.napkin_math,
|
||||
},
|
||||
validated: v.validated,
|
||||
math_verified: v.math_verified,
|
||||
human_reviewed: v.human_reviewed,
|
||||
};
|
||||
}
|
||||
|
||||
let _client: VaultApiClient | null | undefined = undefined;
|
||||
function client(): VaultApiClient {
|
||||
if (_client === undefined) _client = makeClientFromEnv();
|
||||
if (_client === null) {
|
||||
throw new Error(
|
||||
"NEXT_PUBLIC_VAULT_API is not set. Point it at the worker or set "
|
||||
+ "NEXT_PUBLIC_VAULT_FALLBACK=static to use the bundled corpus.",
|
||||
);
|
||||
}
|
||||
return _client;
|
||||
}
|
||||
|
||||
// In-memory cache; SWR (in real consumption via hooks) layers on top.
|
||||
const _byId = new Map<string, Question>();
|
||||
|
||||
export async function getQuestionById(id: string): Promise<Question | null> {
|
||||
if (_byId.has(id)) return _byId.get(id)!;
|
||||
try {
|
||||
const v = await client().getQuestion(id);
|
||||
const q = adapt(v as EnrichedVaultQuestion);
|
||||
_byId.set(id, q);
|
||||
return q;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export async function listQuestions(params: {
|
||||
track?: string; level?: string; zone?: string; limit?: number;
|
||||
} = {}): Promise<Question[]> {
|
||||
const res = await client().listQuestions(params);
|
||||
return (res.items as EnrichedVaultQuestion[]).map(adapt);
|
||||
}
|
||||
|
||||
export async function searchQuestions(q: string, limit = 20): Promise<Question[]> {
|
||||
const res = await client().search(q, limit);
|
||||
return (res.results as EnrichedVaultQuestion[]).map(adapt);
|
||||
}
|
||||
|
||||
/**
|
||||
* Synchronous getQuestions() — compatibility shim for legacy call sites that
|
||||
* expect an array rather than a Promise. Returns the currently-cached set
|
||||
* (populated by prior async calls). Callers doing full-corpus scans must
|
||||
* migrate to listQuestions().
|
||||
*/
|
||||
export function getQuestions(): Question[] {
|
||||
return Array.from(_byId.values());
|
||||
}
|
||||
@@ -429,17 +429,27 @@ const VAULT_API = process.env.NEXT_PUBLIC_VAULT_API
|
||||
const _detailsCache = new Map<string, Question>();
|
||||
let _staticDetailsCache: Map<string, Question> | null = null;
|
||||
|
||||
// Opt-in offline / local-dev mode. Set NEXT_PUBLIC_VAULT_FALLBACK=static and
|
||||
// run `vault build --legacy-json` to materialize corpus.json on disk. Not a
|
||||
// prod safety net: production deploys neither emit nor bundle corpus.json.
|
||||
function shouldUseStaticDetails(): boolean {
|
||||
if (process.env.NEXT_PUBLIC_VAULT_FALLBACK?.toLowerCase() === "static") return true;
|
||||
if (typeof window === "undefined") return false;
|
||||
return window.location.hostname === "localhost" || window.location.hostname === "127.0.0.1";
|
||||
return process.env.NEXT_PUBLIC_VAULT_FALLBACK?.toLowerCase() === "static";
|
||||
}
|
||||
|
||||
async function getStaticFullDetail(id: string, summary: Question): Promise<Question | undefined> {
|
||||
if (!_staticDetailsCache) {
|
||||
const mod = await import("../data/corpus.json");
|
||||
const fullQuestions = mod.default as unknown as Question[];
|
||||
_staticDetailsCache = new Map(fullQuestions.map((q) => [q.id, q]));
|
||||
// Function-constructor dynamic import: hides the path from Turbopack's
|
||||
// static analyzer so prod builds don't require corpus.json to exist.
|
||||
// corpus.json is materialized on disk only when a contributor runs
|
||||
// `vault build --legacy-json` locally with NEXT_PUBLIC_VAULT_FALLBACK=
|
||||
// static. If the file is missing at runtime, the import rejects and
|
||||
// the caller surfaces an error to the UI.
|
||||
const dynImport = new Function(
|
||||
"p",
|
||||
"return import(p)",
|
||||
) as (p: string) => Promise<{ default: Question[] }>;
|
||||
const mod = await dynImport("../data/corpus.json");
|
||||
_staticDetailsCache = new Map(mod.default.map((q) => [q.id, q]));
|
||||
}
|
||||
const full = _staticDetailsCache.get(id);
|
||||
if (!full) return undefined;
|
||||
@@ -457,8 +467,10 @@ async function getStaticFullDetail(id: string, summary: Question): Promise<Quest
|
||||
|
||||
/**
|
||||
* Fetch the FULL question (with `scenario` and `details.*`) from the
|
||||
* Cloudflare Worker. Returns the summary-only record on network failure
|
||||
* so the UI can still render id/title/level/zone.
|
||||
* Cloudflare Worker. Returns the cache-merged Question on success.
|
||||
* Throws on Worker error — useFullQuestion catches and renders the
|
||||
* "details unavailable" state. (Static fallback is opt-in via
|
||||
* NEXT_PUBLIC_VAULT_FALLBACK=static and is handled earlier.)
|
||||
*/
|
||||
export async function getQuestionFullDetail(id: string): Promise<Question | undefined> {
|
||||
const cached = _detailsCache.get(id);
|
||||
@@ -471,51 +483,44 @@ export async function getQuestionFullDetail(id: string): Promise<Question | unde
|
||||
return getStaticFullDetail(id, summary);
|
||||
}
|
||||
|
||||
try {
|
||||
const res = await fetch(`${VAULT_API}/questions/${encodeURIComponent(id)}`, {
|
||||
signal: AbortSignal.timeout(5_000),
|
||||
});
|
||||
if (!res.ok) return (await getStaticFullDetail(id, summary)) ?? summary;
|
||||
// Worker returns a DENORMALIZED row (flat fields straight from the D1
|
||||
// questions table) — common_mistake / realistic_solution / napkin_math
|
||||
// live at the top level, NOT under `details`. Re-nest to match the
|
||||
// site's Question shape before returning, otherwise callers get
|
||||
// `current.details.napkin_math` → TypeError on an undefined details.
|
||||
const full = await res.json() as {
|
||||
scenario?: string;
|
||||
common_mistake?: string;
|
||||
realistic_solution?: string;
|
||||
napkin_math?: string;
|
||||
details?: Question["details"]; // future-proof if worker changes
|
||||
};
|
||||
const workerDetails = full.details ?? {
|
||||
common_mistake: full.common_mistake ?? "",
|
||||
realistic_solution: full.realistic_solution ?? "",
|
||||
napkin_math: full.napkin_math ?? "",
|
||||
};
|
||||
const merged: Question = {
|
||||
...summary,
|
||||
scenario: full.scenario ?? summary.scenario,
|
||||
details: {
|
||||
// Preserve MCQ options/correct_index that came in the summary.
|
||||
...summary.details,
|
||||
...workerDetails,
|
||||
},
|
||||
};
|
||||
_detailsCache.set(id, merged);
|
||||
return merged;
|
||||
} catch {
|
||||
// Worker unreachable → serve the bundled full corpus when available.
|
||||
// This keeps local previews usable even when the Worker blocks localhost
|
||||
// via CORS, and gives production a graceful fallback on transient outages.
|
||||
return (await getStaticFullDetail(id, summary)) ?? summary;
|
||||
}
|
||||
const res = await fetch(`${VAULT_API}/questions/${encodeURIComponent(id)}`, {
|
||||
signal: AbortSignal.timeout(5_000),
|
||||
});
|
||||
if (!res.ok) throw new Error(`worker ${res.status}`);
|
||||
// Worker returns a DENORMALIZED row (flat fields straight from the D1
|
||||
// questions table) — common_mistake / realistic_solution / napkin_math
|
||||
// live at the top level, NOT under `details`. Re-nest to match the
|
||||
// site's Question shape before returning, otherwise callers get
|
||||
// `current.details.napkin_math` → TypeError on an undefined details.
|
||||
const full = await res.json() as {
|
||||
scenario?: string;
|
||||
common_mistake?: string;
|
||||
realistic_solution?: string;
|
||||
napkin_math?: string;
|
||||
details?: Question["details"]; // future-proof if worker changes
|
||||
};
|
||||
const workerDetails = full.details ?? {
|
||||
common_mistake: full.common_mistake ?? "",
|
||||
realistic_solution: full.realistic_solution ?? "",
|
||||
napkin_math: full.napkin_math ?? "",
|
||||
};
|
||||
const merged: Question = {
|
||||
...summary,
|
||||
scenario: full.scenario ?? summary.scenario,
|
||||
details: {
|
||||
// Preserve MCQ options/correct_index that came in the summary.
|
||||
...summary.details,
|
||||
...workerDetails,
|
||||
},
|
||||
};
|
||||
_detailsCache.set(id, merged);
|
||||
return merged;
|
||||
}
|
||||
|
||||
/**
|
||||
* Pre-warm the details cache for a batch of IDs (e.g., gauntlet session).
|
||||
* Fires fetches in parallel, resolves when all complete (or time out).
|
||||
* Fires fetches in parallel; individual failures don't reject the batch.
|
||||
*/
|
||||
export async function prefetchQuestionDetails(ids: string[]): Promise<void> {
|
||||
await Promise.all(ids.map(id => getQuestionFullDetail(id)));
|
||||
await Promise.allSettled(ids.map(id => getQuestionFullDetail(id)));
|
||||
}
|
||||
|
||||
@@ -3,13 +3,19 @@
|
||||
*
|
||||
* The bundled corpus is summary-only (id/title/level/zone/topic/… — no
|
||||
* scenario/details). When a component needs the heavy fields, wrap the
|
||||
* summary with this hook. It fetches from the worker and re-renders.
|
||||
* summary with this hook. It fetches from the Worker and re-renders.
|
||||
*
|
||||
* Returns { question, status }:
|
||||
* - question: the best record we have (summary on first render, or after
|
||||
* a failed fetch; full record once the Worker resolves)
|
||||
* - status: 'loading' while the fetch is in flight, 'ready' on success,
|
||||
* 'error' if the Worker is unreachable. Callers can render an error
|
||||
* hint ("Details unavailable — retry") when status === 'error'.
|
||||
*
|
||||
* Usage:
|
||||
* const current = getQuestionById(qId); // sync, summary only
|
||||
* const full = useFullQuestion(current); // async hydrate
|
||||
* // First render: full === current (scenario/details undefined)
|
||||
* // After fetch: full === { ...current, scenario, details }
|
||||
* const summary = getQuestionById(qId);
|
||||
* const { question, status } = useFullQuestion(summary);
|
||||
* if (status === 'error') return <DetailsUnavailable onRetry={…} />;
|
||||
*/
|
||||
|
||||
"use client";
|
||||
@@ -17,39 +23,55 @@
|
||||
import { useEffect, useState } from "react";
|
||||
import { getQuestionFullDetail, type Question } from "../corpus";
|
||||
|
||||
export function useFullQuestion(summary: Question | undefined | null): Question | undefined {
|
||||
const [hydrated, setHydrated] = useState<Question | undefined>(
|
||||
summary ?? undefined,
|
||||
);
|
||||
export type UseFullQuestionStatus = "loading" | "ready" | "error";
|
||||
|
||||
export interface UseFullQuestionResult {
|
||||
question: Question | undefined;
|
||||
status: UseFullQuestionStatus;
|
||||
}
|
||||
|
||||
export function useFullQuestion(
|
||||
summary: Question | undefined | null,
|
||||
): UseFullQuestionResult {
|
||||
const [result, setResult] = useState<UseFullQuestionResult>(() => ({
|
||||
question: summary ?? undefined,
|
||||
status: summary ? "loading" : "ready",
|
||||
}));
|
||||
|
||||
useEffect(() => {
|
||||
if (!summary) {
|
||||
setHydrated(undefined);
|
||||
setResult({ question: undefined, status: "ready" });
|
||||
return;
|
||||
}
|
||||
// If we already have scenario cached in the summary, skip fetch.
|
||||
// Already hydrated in the summary itself (rare, but possible if a
|
||||
// future bundle ships details inline). Skip the fetch.
|
||||
if (summary.scenario && summary.details?.realistic_solution) {
|
||||
setHydrated(summary);
|
||||
setResult({ question: summary, status: "ready" });
|
||||
return;
|
||||
}
|
||||
// Seed with summary so listing UI renders instantly; then hydrate.
|
||||
setHydrated(summary);
|
||||
setResult({ question: summary, status: "loading" });
|
||||
let cancelled = false;
|
||||
getQuestionFullDetail(summary.id).then(full => {
|
||||
if (cancelled || !full) return;
|
||||
// Merge rather than replace: the worker returns the heavy fields
|
||||
// (scenario, details) but does not necessarily carry every
|
||||
// summary-bundle field. Summary fields like `question` (the
|
||||
// explicit-ask prompt) live in the bundle and would otherwise be
|
||||
// dropped by a straight replace. Spread summary first so worker
|
||||
// values win where they overlap (they carry the real content),
|
||||
// but summary-only fields survive.
|
||||
setHydrated({ ...summary, ...full });
|
||||
});
|
||||
getQuestionFullDetail(summary.id)
|
||||
.then(full => {
|
||||
if (cancelled) return;
|
||||
if (!full) {
|
||||
setResult({ question: summary, status: "error" });
|
||||
return;
|
||||
}
|
||||
// Merge rather than replace: the Worker returns the heavy fields
|
||||
// (scenario, details) but does not necessarily carry every
|
||||
// summary-bundle field. Spread summary first so Worker values
|
||||
// win where they overlap, but summary-only fields survive.
|
||||
setResult({ question: { ...summary, ...full }, status: "ready" });
|
||||
})
|
||||
.catch(() => {
|
||||
if (cancelled) return;
|
||||
setResult({ question: summary, status: "error" });
|
||||
});
|
||||
return () => {
|
||||
cancelled = true;
|
||||
};
|
||||
}, [summary?.id]); // re-run when the summary ID changes
|
||||
}, [summary?.id]);
|
||||
|
||||
return hydrated;
|
||||
return result;
|
||||
}
|
||||
|
||||
@@ -1,49 +0,0 @@
|
||||
/**
|
||||
* React hook — single-question fetch through the Phase-4 cutover router.
|
||||
*
|
||||
* Components that opt into the cutover import `useVaultQuestion` instead of
|
||||
* calling `corpus.getQuestions()` synchronously. On `NEXT_PUBLIC_VAULT_FALLBACK=static`
|
||||
* it returns the question from the bundled corpus (synchronous resolve);
|
||||
* otherwise it fetches via the Worker API through `corpus-source.ts`.
|
||||
*
|
||||
* Part of B.17 — the migration path for existing components is one-at-a-time
|
||||
* swap from `corpus.getQuestionById()` to `useVaultQuestion()`.
|
||||
*/
|
||||
|
||||
import { useEffect, useState } from "react";
|
||||
import { getQuestionById } from "../corpus-source";
|
||||
|
||||
export interface UseVaultQuestionState<T> {
|
||||
data: T | null;
|
||||
loading: boolean;
|
||||
error: Error | null;
|
||||
}
|
||||
|
||||
export function useVaultQuestion<T = unknown>(id: string | null): UseVaultQuestionState<T> {
|
||||
const [state, setState] = useState<UseVaultQuestionState<T>>({
|
||||
data: null,
|
||||
loading: id !== null,
|
||||
error: null,
|
||||
});
|
||||
|
||||
useEffect(() => {
|
||||
if (id === null) {
|
||||
setState({ data: null, loading: false, error: null });
|
||||
return;
|
||||
}
|
||||
let cancelled = false;
|
||||
setState(s => ({ ...s, loading: true, error: null }));
|
||||
getQuestionById(id)
|
||||
.then(result => {
|
||||
if (cancelled) return;
|
||||
setState({ data: result as T, loading: false, error: null });
|
||||
})
|
||||
.catch(err => {
|
||||
if (cancelled) return;
|
||||
setState({ data: null, loading: false, error: err instanceof Error ? err : new Error(String(err)) });
|
||||
});
|
||||
return () => { cancelled = true; };
|
||||
}, [id]);
|
||||
|
||||
return state;
|
||||
}
|
||||
@@ -1,5 +1,5 @@
|
||||
import taxonomyData from "../data/taxonomy.json";
|
||||
import corpusData from "../data/corpus.json";
|
||||
import corpusData from "../data/corpus-summary.json";
|
||||
import zonesData from "../data/zones.json";
|
||||
import {
|
||||
HardDrive, Cpu, Rocket, Layers, Timer, Shuffle,
|
||||
|
||||
@@ -1,22 +0,0 @@
|
||||
/**
|
||||
* Fallback-mode detection for the Phase-4 cutover.
|
||||
*
|
||||
* When NEXT_PUBLIC_VAULT_FALLBACK=static, the site reads from the bundled
|
||||
* corpus.json (pre-cutover behavior preserved). When unset or 'vault', the
|
||||
* site reads from the Worker API via vault-api.ts.
|
||||
*
|
||||
* One config change inverts the dataflow — no file restore required
|
||||
* (ARCHITECTURE.md §7.1 / §6.2, fix for C-1 "one-line revert" lie).
|
||||
*/
|
||||
|
||||
export type VaultSource = "static" | "vault-api";
|
||||
|
||||
export function getVaultSource(): VaultSource {
|
||||
const flag = process.env.NEXT_PUBLIC_VAULT_FALLBACK?.toLowerCase();
|
||||
if (flag === "static") return "static";
|
||||
return "vault-api";
|
||||
}
|
||||
|
||||
export function usingFallback(): boolean {
|
||||
return getVaultSource() === "static";
|
||||
}
|
||||
@@ -1,210 +0,0 @@
|
||||
# Cutover-Day QA Checklist
|
||||
|
||||
> **When to use**: Phase 4 cutover (static `corpus.json` → Worker API + D1).
|
||||
> **Who runs**: release operator, sequentially, alone — not in parallel with other site work.
|
||||
> **Expands**: ARCHITECTURE.md §19.4.
|
||||
> **Rehearsal**: this checklist runs end-to-end on **staging** as a dry run before production cutover.
|
||||
|
||||
---
|
||||
|
||||
## 0. Pre-cutover gate checks (must all be GREEN before starting)
|
||||
|
||||
- [ ] `vault verify <release>` on the release to be deployed → exit 0.
|
||||
- [ ] `vault smoke-test --env staging --samples 50` → 0 divergences.
|
||||
- [ ] All E2E Playwright tests green on staging against staging D1.
|
||||
- [ ] Lighthouse CI gates green on staging:
|
||||
- [ ] practice/page.js transferred ≤ 300 KB gz.
|
||||
- [ ] gauntlet/page.js ≤ 250 KB gz.
|
||||
- [ ] landing/page.js ≤ 200 KB gz.
|
||||
- [ ] FCP (95th pct, 4G) ≤ 1.2s.
|
||||
- [ ] TTI (95th pct, 4G) ≤ 2.5s.
|
||||
- [ ] Repeat-visit TTI ≤ 800ms.
|
||||
- [ ] API round-trip p99 ≤ 250ms (question detail).
|
||||
- [ ] FTS5 load-test artifacts from Phase 3 still valid (re-run if >30 days old).
|
||||
- [ ] R2 pre-deploy snapshot of current production D1 exists and is restore-tested.
|
||||
- [ ] Rollback drill executed on staging within last 7 days (see §4).
|
||||
- [ ] Go/no-go reviewed with user. **GO** recorded in an operator log.
|
||||
|
||||
If ANY item is red, **do not proceed**. Fix the underlying issue, re-run the gate.
|
||||
|
||||
---
|
||||
|
||||
## 1. Ship the release
|
||||
|
||||
> **Note on canary staging** (R10-F-2 + R11): percentage-based traffic split
|
||||
> is not implemented in `vault ship` (deferred to Phase 7 per ARCHITECTURE.md
|
||||
> §4.3). Current ship is all-or-nothing at the release-keyed Cache API layer.
|
||||
> Soak windows below still apply — they're now at 100% traffic, gated on
|
||||
> dashboard-green before advancing from staging to production.
|
||||
|
||||
- [ ] `vault ship <release> --env staging` → journal reports all 3 legs DEPLOYED.
|
||||
- [ ] `vault smoke-test --env staging --samples 50` post-ship → 0 divergences.
|
||||
- [ ] Soak 15 min OR ≥100 sessions at 100% staging traffic, whichever longer.
|
||||
- [ ] All transport SLIs green (5xx <1%, p99 <500ms).
|
||||
- [ ] All data-plane SLIs green (row-count parity, content-hash sampling, FTS5 parity, schema_fingerprint).
|
||||
- [ ] `vault ship <release> --env production` → journal reports all 3 legs DEPLOYED.
|
||||
- [ ] `.ship-journal.json` written; tail the journal.
|
||||
- [ ] D1 deploy leg: complete (R2 snapshot taken pre-migration).
|
||||
- [ ] Next.js deploy leg: complete.
|
||||
- [ ] Paper-tag push leg: complete (last).
|
||||
- [ ] `point_of_no_return: true` in journal.
|
||||
- [ ] Soak 15 min OR ≥100 sessions post-production-ship.
|
||||
- [ ] `vault smoke-test --env production --samples 100` → 0 divergences.
|
||||
- [ ] If any SLI reds during soak: `vault rollback <prev-release> --env production --method snapshot --snapshot-ts <ts>` (§6.2 primary path).
|
||||
|
||||
---
|
||||
|
||||
## 2. User-facing flows (manual QA on production)
|
||||
|
||||
Operator runs each flow in a clean browser window (no extensions, no prior localStorage). Check the box if the flow completes without error AND the expected outcome is visible.
|
||||
|
||||
### 2.1 Home / landing
|
||||
|
||||
- [ ] `https://staffml.mlsysbook.ai/` loads.
|
||||
- [ ] Total question count matches `vault stats --release <release>` exact integer.
|
||||
- [ ] No request in Network tab for `corpus.json` (the 19 MB static file must not be fetched).
|
||||
- [ ] `practice/page.js` transferred size ≤ 300 KB gzipped (verify in DevTools → Network).
|
||||
- [ ] FCP ≤ 1.2s (check via Lighthouse).
|
||||
- [ ] `X-Vault-Release` header present on `/manifest` response; value = current release.
|
||||
|
||||
### 2.2 Practice
|
||||
|
||||
- [ ] Navigate to `/practice`.
|
||||
- [ ] Filter by track → results update.
|
||||
- [ ] Filter by level → results update.
|
||||
- [ ] Filter by zone → results update.
|
||||
- [ ] Combination filter (track + level + zone) returns expected subset.
|
||||
- [ ] Reveal answer on a question → solution renders (Markdown + KaTeX if applicable).
|
||||
- [ ] Navigate a chained question → "Part N of M" badge visible BEFORE reveal.
|
||||
- [ ] Click chain-badge link → chain sibling list opens.
|
||||
- [ ] AskInterviewer tutor → ask a question → response arrives within 10s, no errors.
|
||||
- [ ] Reveal → AskInterviewer switches to study mode; tutor knows canonical answer.
|
||||
|
||||
### 2.3 Gauntlet
|
||||
|
||||
- [ ] Start a gauntlet session with filter → session launches.
|
||||
- [ ] Complete N questions (at least 3, mix of right and wrong) → scores tracked.
|
||||
- [ ] View post-mortem → per-question feedback shown.
|
||||
- [ ] Navigate back to landing → session marked complete in localStorage.
|
||||
|
||||
### 2.4 Progress
|
||||
|
||||
- [ ] `/progress` page loads.
|
||||
- [ ] Attempts from §2.3 persist.
|
||||
- [ ] Due-count correct against the test-interval logic.
|
||||
- [ ] No console errors.
|
||||
|
||||
### 2.5 About
|
||||
|
||||
- [ ] `/about` loads.
|
||||
- [ ] "Read the paper" call-out visible **above the fold** (no scrolling required on a 1920×1080 viewport).
|
||||
- [ ] BibTeX snippet renders.
|
||||
- [ ] DOI (if registered) clickable.
|
||||
- [ ] Release ID + release_hash visible in footer for reproducibility.
|
||||
- [ ] Contributor list renders authors from current release's `authors:` fields.
|
||||
|
||||
### 2.6 Command palette / search
|
||||
|
||||
- [ ] `⌘K` (Mac) / `Ctrl+K` (Windows/Linux) opens modal from any page.
|
||||
- [ ] Input placeholder: "Search N questions by title, scenario, or solution."
|
||||
- [ ] Type a term → 200ms debounce → results appear with snippet highlights.
|
||||
- [ ] Up/Down arrow navigates results.
|
||||
- [ ] Enter opens question; `⌘Enter` opens in new tab.
|
||||
- [ ] Escape closes modal.
|
||||
- [ ] Empty query state: helpful message + browse-by-topic link.
|
||||
- [ ] No-results state: "no results for '...'" message + clear-filters CTA.
|
||||
- [ ] Mobile (iPhone 15 viewport, 393×852): full-screen modal, no iOS zoom on input focus, touch targets ≥ 44px.
|
||||
|
||||
### 2.7 Chain UX
|
||||
|
||||
- [ ] On a chained question (e.g., part 2 of 4), pre-reveal chain badge is visible.
|
||||
- [ ] Badge text: "Part 2 of 4 — <chain name>".
|
||||
- [ ] Badge click → sibling list drawer; shows all chain members with their status (attempted / unattempted).
|
||||
- [ ] Analytics events fired: `chain_badge_shown`, `chain_badge_clicked` (check Cloudflare Analytics real-time).
|
||||
|
||||
### 2.8 Offline resilience
|
||||
|
||||
- [ ] With the site loaded and at least 5 questions visited:
|
||||
- [ ] Open DevTools → Application → Service Workers → verify `sw.js` registered, controlling.
|
||||
- [ ] Network → check "Offline" → reload page.
|
||||
- [ ] Site shell renders.
|
||||
- [ ] Previously-visited question detail pages load from SW cache.
|
||||
- [ ] "Serving from cache" indicator visible.
|
||||
- [ ] Toggle back online → SW revalidates manifest → indicator disappears.
|
||||
|
||||
---
|
||||
|
||||
## 3. Network + bundle verification
|
||||
|
||||
- [ ] **No `corpus.json` fetch** anywhere in the user journey (Network tab filter `corpus`).
|
||||
- [ ] **Request to `/manifest` returns < 5 KB.**
|
||||
- [ ] **Request to `/questions/<id>` returns < 10 KB and has correct `ETag` format** `"<release>:<resource>:<content_hash>"`.
|
||||
- [ ] **304 behavior**: hard-refresh a just-visited question → browser sends `If-None-Match` → Worker returns 304.
|
||||
- [ ] **Cache API hit on warm**: refresh → Network tab shows `from disk cache` or `from service worker` for manifest/taxonomy.
|
||||
- [ ] **No console errors** across all flows above.
|
||||
- [ ] **No CSP violations** (DevTools → Console filter `Content-Security-Policy`).
|
||||
|
||||
---
|
||||
|
||||
## 4. Rollback drill (executed on staging before production cutover)
|
||||
|
||||
Rehearsal, not optional. Log steps + timings in the operator log.
|
||||
|
||||
- [ ] Staging site warm with an active service worker (user has visited ≥10 questions).
|
||||
- [ ] Set `NEXT_PUBLIC_VAULT_FALLBACK=static` in the site environment.
|
||||
- [ ] Redeploy site (one command).
|
||||
- [ ] **Timer start.**
|
||||
- [ ] User reloads tab.
|
||||
- [ ] Service worker evicts stale release-keyed entries.
|
||||
- [ ] Site loads from static inlined corpus + manifest.
|
||||
- [ ] Question detail pages render.
|
||||
- [ ] No console errors.
|
||||
- [ ] AskInterviewer: if worker is still up, tutor works; if down, graceful "tutor temporarily unavailable" indicator.
|
||||
- [ ] **Timer stop.** Target: rollback complete + user-visible within 10 minutes. Record actual.
|
||||
- [ ] Restore `NEXT_PUBLIC_VAULT_FALLBACK` unset; redeploy; verify Worker-backed state resumes.
|
||||
|
||||
If ANY step is red, do NOT proceed to production cutover. File an issue and fix the rollback path first.
|
||||
|
||||
---
|
||||
|
||||
## 5. Post-cutover watch (first 48 hours on production)
|
||||
|
||||
- [ ] Dashboard watch scheduled: 30 min, 2h, 6h, 12h, 24h, 48h checkpoints.
|
||||
- [ ] At each checkpoint:
|
||||
- [ ] Transport SLIs green.
|
||||
- [ ] All data-plane SLIs green (row-count, content-hash sample, FTS5, schema_fingerprint, release-id propagation).
|
||||
- [ ] Search latency p99 within budget.
|
||||
- [ ] Error-tracker: no new Sentry clusters.
|
||||
- [ ] Cost ledger: D1 row-reads tracking within 2× forecast.
|
||||
- [ ] At 48h: post-cutover review with user; decide on Phase 5 kickoff.
|
||||
|
||||
---
|
||||
|
||||
## 6. Rollback trigger — when to abort
|
||||
|
||||
If any of the following occur within the first 48h, trigger rollback via `NEXT_PUBLIC_VAULT_FALLBACK=static`:
|
||||
|
||||
- 5xx rate > 5% sustained for > 2 min.
|
||||
- p99 latency > 1 s sustained for > 5 min.
|
||||
- Any data-plane SLI red for > 10 min without explanation.
|
||||
- Schema-fingerprint mismatch that persists past a single POP cold-start cycle.
|
||||
- User-visible content corruption (question renders differently from staging).
|
||||
- Cost forecast exceeded by > 3× over any 1-hour window.
|
||||
|
||||
Rollback does NOT require another user approval — this checklist pre-authorizes the operator to roll back on trigger conditions. Forward-fix decisions (vs rollback) are user-approval-gated.
|
||||
|
||||
---
|
||||
|
||||
## 7. Post-cutover sign-off
|
||||
|
||||
After 48h clean watch:
|
||||
|
||||
- [ ] Final `vault smoke-test --env production --samples 100` green.
|
||||
- [ ] Operator log committed to `interviews/vault/releases/<version>/cutover-log.md`.
|
||||
- [ ] Retention policy noted: keep `corpus.json` in site bundle until first schema-major bump OR 2 releases post-cutover, whichever is later (ARCHITECTURE.md §7.1).
|
||||
- [ ] Phase 4 marked complete in the project tracker.
|
||||
- [ ] Post-mortem session scheduled if anything from §6 triggered during watch window.
|
||||
|
||||
---
|
||||
|
||||
**End of cutover checklist.** File at `interviews/vault-cli/docs/CUTOVER_QA.md` — keep in sync with ARCHITECTURE.md and TESTING.md.
|
||||
@@ -1,331 +0,0 @@
|
||||
# Resume Plan — Massive Build Session (2026-04-25)
|
||||
|
||||
**Purpose:** hand the next Claude session everything it needs to pick up
|
||||
the day's massive question-generation work without re-discovering state.
|
||||
|
||||
**Current branch:** `feat/massive-build-2026-04-25` (off
|
||||
`audit/vault-schema-folder` ← off `dev`)
|
||||
**Worktree:** `/Users/VJ/GitHub/MLSysBook-vault-audit`
|
||||
**Last commit:** `24d3269c7 feat(vault): Phase 0 — competency_area cleanup + closed-enum hardening`
|
||||
|
||||
---
|
||||
|
||||
## What's already done (do NOT redo)
|
||||
|
||||
### From the audit branch (parent)
|
||||
|
||||
- 4,754 cohort-tagged IDs renamed to clean `<track>-NNNN` form
|
||||
(commit `8a5c3ff3c`).
|
||||
- Redirect map at `interviews/vault/docs/id-renames-2026-04-25.yaml` +
|
||||
`interviews/staffml/src/data/id-redirects.json` — preserves shared
|
||||
links to renamed IDs. Wired into the practice page's `?q=` handler.
|
||||
- 8 Playwright tests passing.
|
||||
- `vault check --strict` clean.
|
||||
|
||||
### From this session (commit `24d3269c7`)
|
||||
|
||||
- **Phase 0 cleanup**: 41 malformed `competency_area` values fixed (e.g.,
|
||||
`data-pipeline-engineering` → `data`, `evaluation` → `cross-cutting`,
|
||||
`tinyml / queueing-theory` → `latency`).
|
||||
- **LinkML schema**: added `CompetencyArea` closed enum. `competency_area`
|
||||
field now references it. Future malformed values fail validation.
|
||||
- **Pydantic validator**: `_area()` field_validator on `Question` rejects
|
||||
anything outside `VALID_COMPETENCY_AREAS`.
|
||||
- **Generator defaults raised**: `batch_size` 12 → 30, `total` 12 → 30,
|
||||
`max_calls` 10 → 20. Gemini's 1M context easily handles 30 cells/call;
|
||||
the 250/day cap rewards bigger batches.
|
||||
- **`MASSIVE_BUILD_RUNBOOK.md`**: the methodology document — read this
|
||||
first if you don't know what to do next.
|
||||
|
||||
### Verified
|
||||
|
||||
- Bundle: 9,224 published, **13 canonical competency areas, 0
|
||||
malformed**.
|
||||
- All 8 Playwright tests pass.
|
||||
- `vault check --strict` clean.
|
||||
|
||||
---
|
||||
|
||||
## Current corpus state
|
||||
|
||||
```
|
||||
Published: 9,224
|
||||
cloud: 4,131 (44.8%)
|
||||
edge: 1,976 (21.4%)
|
||||
mobile: 1,644 (17.8%)
|
||||
tinyml: 1,168 (12.7%)
|
||||
global: 305 ( 3.3%)
|
||||
|
||||
Drafts (status:draft): 275
|
||||
Deleted (dedup archive): 458
|
||||
Total YAMLs: 9,982
|
||||
|
||||
Visual-eligible (published): 17 across 8 of 10 archetypes
|
||||
Missing: collective-communication (0), kv-cache-management (0)
|
||||
|
||||
Top track-area gaps:
|
||||
TinyML/parallelism: 0 of ~90 expected
|
||||
Mobile/parallelism: 0 of ~127 expected
|
||||
Edge/parallelism: 11 of ~152 expected
|
||||
TinyML/networking: 2 of ~90 expected
|
||||
Global L4-L6+: ~13% of expected density
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API budget
|
||||
|
||||
- **Gemini cap**: 250 calls/day
|
||||
- **Used today (estimate)**: ~30 calls (audit + Phase 0 dry-runs)
|
||||
- **Available**: ~220 calls
|
||||
- **Plan budget**: ~80 calls (40 generation + 30 judge + 10 buffer)
|
||||
- **Headroom remaining**: 140 calls for retries
|
||||
|
||||
---
|
||||
|
||||
## What to do next — execute these phases in order
|
||||
|
||||
Each phase is a single command (or short sequence). Stop after Phase 7
|
||||
or earlier if anything looks wrong.
|
||||
|
||||
### Phase 1 — Run the analyzer (1 minute)
|
||||
|
||||
```bash
|
||||
cd /Users/VJ/GitHub/MLSysBook-vault-audit
|
||||
python3 interviews/vault/scripts/analyze_coverage_gaps.py \
|
||||
--total 100 --published-only
|
||||
```
|
||||
|
||||
Output goes to `interviews/vault/_validation_results/coverage_gaps/<ts>/`.
|
||||
Look at `report.md` for the priority gap ranking. Top cells should be
|
||||
the TinyML/Mobile/Edge parallelism rows and Global L4-L6+ cells.
|
||||
|
||||
### Phase 2 — Bump loop defaults, then run (2-4 hours wall clock; 80 API calls)
|
||||
|
||||
First, bump the loop defaults. **Edit** `interviews/vault/scripts/iterate_coverage_loop.py`:
|
||||
|
||||
| Flag | Current default | New default |
|
||||
|---|---|---|
|
||||
| `--max-iters` | 20 | 30 |
|
||||
| `--max-calls` | 60 | 80 |
|
||||
| `--gen-batch-size` | 12 | 30 |
|
||||
| `--gen-calls-per-iter` | 3 | 4 |
|
||||
| `--judge-chunk-size` | 15 | 25 |
|
||||
|
||||
Specifically lines 220-226 of `iterate_coverage_loop.py`. Update both the
|
||||
`default=N` values AND the help text comments.
|
||||
|
||||
Then run the loop:
|
||||
|
||||
```bash
|
||||
python3 interviews/vault/scripts/iterate_coverage_loop.py \
|
||||
--max-iters 30 \
|
||||
--max-calls 80 \
|
||||
--gen-batch-size 30 \
|
||||
--gen-calls-per-iter 4 \
|
||||
--judge-chunk-size 25 \
|
||||
--visual-each-iter \
|
||||
--gap-threshold 0.8 \
|
||||
--max-drop-rate 0.35
|
||||
```
|
||||
|
||||
Each iteration:
|
||||
- 4 generation calls × 30 cells = 120 questions
|
||||
- 1-2 judge calls
|
||||
- ~5 minutes wall clock
|
||||
|
||||
The loop self-paces and stops on saturation (drop rate > 35%, gap
|
||||
priority < 0.8, or convergence on the same top cell two iters in a
|
||||
row).
|
||||
|
||||
**Expected output**: 600-1,200 generated drafts, 70-75% pass rate via
|
||||
judge, 8-15 iterations before auto-stop.
|
||||
|
||||
### Phase 3 — Quality gate (10 min)
|
||||
|
||||
Spot-read 3 generated drafts per track:
|
||||
|
||||
```bash
|
||||
ls -t interviews/vault/questions/cloud/*.yaml | head -3 | xargs -I{} cat {}
|
||||
ls -t interviews/vault/questions/tinyml/*.yaml | head -3 | xargs -I{} cat {}
|
||||
# etc.
|
||||
```
|
||||
|
||||
Check the visual quality on 2 random visual drafts via Playwright by
|
||||
deep-linking. Open `/practice?q=<id>` for an SVG visual that was just
|
||||
rendered, eyeball whether it fits the column at 720px width without
|
||||
overflow, alt text reads clean, no horizontal scroll.
|
||||
|
||||
### Phase 4 — Promote PASS items + rebuild bundle (5 min)
|
||||
|
||||
```bash
|
||||
python3 interviews/vault/scripts/promote_validated.py
|
||||
PYTHONPATH=interviews/vault-cli/src \
|
||||
python3 -m vault_cli.main build --legacy-json
|
||||
PYTHONPATH=interviews/vault-cli/src \
|
||||
python3 -m vault_cli.main check --strict
|
||||
```
|
||||
|
||||
Acceptance: `vault check --strict` returns exit 0, no orphan chains,
|
||||
`published_count` is up by ~600-900.
|
||||
|
||||
### Phase 5 — Refresh paper artifacts (10 min)
|
||||
|
||||
```bash
|
||||
# vault build re-emits corpus.json to staffml/. Mirror it to vault/:
|
||||
cp interviews/staffml/src/data/corpus.json interviews/vault/corpus.json
|
||||
|
||||
# Then the paper-side regen sequence:
|
||||
cd interviews/paper
|
||||
python3 scripts/analyze_corpus.py # legacy schema corpus_stats.json
|
||||
python3 scripts/generate_figures.py # 4 data figures
|
||||
PYTHONPATH=../vault-cli/src python3 scripts/generate_macros.py
|
||||
# macros.tex + corpus_stats.json (overwrites legacy)
|
||||
|
||||
# Update hardcoded zone counts in paper.tex if shifted:
|
||||
# Line ~867: "diagnosis (1{,}583), fluency (1{,}227), and evaluation (1{,}113)"
|
||||
# Replace with new values from current corpus_stats.json by_zone.
|
||||
|
||||
pdflatex -interaction=nonstopmode paper.tex
|
||||
```
|
||||
|
||||
Acceptance: `Output written on paper.pdf (N pages, ...)` with no
|
||||
"undefined citation" errors in the output (citation warnings are pre-
|
||||
existing and unrelated).
|
||||
|
||||
### Phase 6 — GUI verification (5 min)
|
||||
|
||||
```bash
|
||||
# Restart dev server fresh:
|
||||
pkill -f "next-server\|next dev"; sleep 1
|
||||
cd /Users/VJ/GitHub/MLSysBook-vault-audit/interviews/staffml
|
||||
(npx next dev > /tmp/staffml-dev.log 2>&1 &)
|
||||
sleep 8
|
||||
curl -sI http://localhost:3000/practice 2>&1 | head -1 # expect 200
|
||||
|
||||
npx playwright test tests/practice-smoke.spec.ts --reporter=list
|
||||
```
|
||||
|
||||
Acceptance: all 8 tests pass.
|
||||
|
||||
Then a manual eyeball: open `http://localhost:3000/practice` in a
|
||||
browser, click the area filter, confirm exactly 13 canonical entries
|
||||
plus "All". This is the user-facing fix that motivated Phase 0.
|
||||
|
||||
### Phase 7 — Atomic commit (3 min)
|
||||
|
||||
```bash
|
||||
cd /Users/VJ/GitHub/MLSysBook-vault-audit
|
||||
git status --short # should show vault/questions/ changes + paper artifacts
|
||||
|
||||
git add interviews/vault/questions/ \
|
||||
interviews/staffml/src/data/corpus.json \
|
||||
interviews/staffml/src/data/corpus-summary.json \
|
||||
interviews/staffml/src/data/vault-manifest.json \
|
||||
interviews/paper/macros.tex \
|
||||
interviews/paper/corpus_stats.json \
|
||||
interviews/paper/figures/ \
|
||||
interviews/paper/paper.tex \
|
||||
interviews/vault/_validation_results/
|
||||
|
||||
git commit -m "feat(vault): massive build — N drafts generated, M promoted
|
||||
|
||||
Phase 1 (analyzer): top priority cells were tinyml/parallelism (0/90),
|
||||
mobile/parallelism (0/127), edge/parallelism (11/152).
|
||||
Phase 2 (loop): <ITERS> iterations, <CALLS> API calls, <GEN> generated.
|
||||
Auto-stop fired on: <SATURATION REASON>.
|
||||
Phase 3 (quality): spot-read 15 drafts; <Y/N> needed manual edits.
|
||||
Phase 4 (promote): <K> PASS items promoted; bundle now <P> published.
|
||||
Phase 5 (paper): macros bumped to <P>, figures rebuilt, zone-count
|
||||
prose updated.
|
||||
Phase 6 (GUI): all 8 Playwright tests pass; area filter shows 13
|
||||
canonical entries.
|
||||
|
||||
The runbook (vault/docs/MASSIVE_BUILD_RUNBOOK.md) is the methodology
|
||||
this session followed; it can be re-run on any future generation day."
|
||||
```
|
||||
|
||||
If the corpus.json hand-edit warning fires, add the trailer:
|
||||
```
|
||||
Vault-Override: corpus-json-hand-edit: regenerated via vault build
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common saturation outcomes
|
||||
|
||||
If Phase 2's loop stops early, the auto-stop reason will be one of:
|
||||
|
||||
| Reason | Meaning | What to do |
|
||||
|---|---|---|
|
||||
| `top priority gap < 0.8` | Corpus is balanced enough that no cell is desperately empty | This is success. Move to Phase 3. |
|
||||
| `DROP rate > 35%` | Gemini is hallucinating; cells we're targeting are nonsensical for some tracks | Inspect the latest iter's `judge_summary.json` to see which cells failed. Add to `TRACK_TOPIC_BLOCKLIST` in `analyze_coverage_gaps.py`. |
|
||||
| `same top cell two iters in a row` | Generator can't fill the cell (likely matplotlib script crashing) | Check `_validation_results/gemini_generation/<latest>/raw_*.txt` for the source code Gemini generated; render manually with `python3 render_visuals.py --id <id>` to see the error. |
|
||||
| `max-iters reached` | Hit the iteration cap before saturation | Re-run with higher `--max-iters 50` if budget allows. |
|
||||
| `max-calls reached` | Burned through the API budget | Stop. We're done for the day. |
|
||||
|
||||
---
|
||||
|
||||
## What NOT to do
|
||||
|
||||
These are settled decisions; don't relitigate without explicit user
|
||||
direction:
|
||||
|
||||
- ❌ Don't add `<track>/<topic>/` subdirs (ARCHITECTURE.md §3.3 — flat
|
||||
is correct).
|
||||
- ❌ Don't rename more legacy IDs (already done: 4,754 renamed in commit
|
||||
`8a5c3ff3c`).
|
||||
- ❌ Don't merge to dev without explicit user OK.
|
||||
- ❌ Don't push to remote without explicit user OK.
|
||||
- ❌ Don't change schema enum values (CompetencyArea, Track, Level, Zone,
|
||||
Status, Provenance) — those are the canonical 4-axis taxonomy.
|
||||
- ❌ Don't auto-promote NEEDS_FIX items; only PASS verdicts go to
|
||||
published.
|
||||
- ❌ Don't skip the Pydantic validator pass (`vault check --strict`)
|
||||
before commit.
|
||||
|
||||
---
|
||||
|
||||
## Files of interest (for context)
|
||||
|
||||
| File | Why |
|
||||
|---|---|
|
||||
| `interviews/vault/docs/MASSIVE_BUILD_RUNBOOK.md` | The full day's methodology. Read first. |
|
||||
| `interviews/vault/audit/2026-04-25-schema-folder-audit.md` | Why the schema/folder is shaped the way it is. |
|
||||
| `interviews/vault/CHANGELOG.md` | History of the v0.1 → v1.0 migration and what it fixed. |
|
||||
| `interviews/vault/ARCHITECTURE.md` §3.3 | Why path-as-classification was rejected. |
|
||||
| `interviews/vault/docs/ID_SCHEMES.md` | Why IDs are `<track>-NNNN`. |
|
||||
| `interviews/vault/docs/id-renames-2026-04-25.yaml` | The 4,754 cohort→clean rename map. |
|
||||
| `interviews/vault/scripts/iterate_coverage_loop.py` | The day's main driver. |
|
||||
| `interviews/vault/scripts/analyze_coverage_gaps.py` | Priority ranking. |
|
||||
| `interviews/vault/scripts/gemini_cli_generate_questions.py` | Batched Gemini generation. |
|
||||
| `interviews/vault/scripts/gemini_cli_llm_judge.py` | Multi-criteria validator. |
|
||||
| `interviews/vault/scripts/promote_validated.py` | Lifecycle flip. |
|
||||
| `interviews/vault/scripts/render_visuals.py` | DOT/matplotlib → SVG. |
|
||||
| `interviews/vault/scripts/fix_competency_areas.py` | Phase 0 cleanup script (one-time, can re-run safely). |
|
||||
|
||||
---
|
||||
|
||||
## One-liner status check (run first in next session)
|
||||
|
||||
```bash
|
||||
cd /Users/VJ/GitHub/MLSysBook-vault-audit && \
|
||||
git log --oneline -5 && echo "---" && \
|
||||
git status --short | head -10 && echo "---" && \
|
||||
PYTHONPATH=interviews/vault-cli/src \
|
||||
python3 -m vault_cli.main check --strict 2>&1 | tail -3 && \
|
||||
echo "---" && \
|
||||
python3 -c "
|
||||
import json
|
||||
c = json.load(open('interviews/staffml/src/data/corpus.json'))
|
||||
print(f'published: {len(c)}')
|
||||
visuals = [q for q in c if q.get('visual')]
|
||||
print(f'with visuals: {len(visuals)}')
|
||||
from collections import Counter
|
||||
print('areas:', sorted(set(q['competency_area'] for q in c)))
|
||||
"
|
||||
```
|
||||
|
||||
If the output shows commit `24d3269c7`, clean tree, `vault check`
|
||||
passes, and 13 canonical areas — the resume state is healthy. Proceed
|
||||
to Phase 1.
|
||||
@@ -1,273 +0,0 @@
|
||||
# Resume Plan — Phase D/E/F (Priority Gap Closure + Generator Leverage)
|
||||
|
||||
**Purpose:** hand the next Claude session everything it needs to close
|
||||
the parallelism + global L4-L6+ gaps that have remained open across
|
||||
two prior multi-phase pushes, plus three high-leverage generator
|
||||
improvements that pay for themselves on every future run.
|
||||
|
||||
**Companion docs (same branch):**
|
||||
- `RESUME_PLAN_2026-04-25.md` — Phase 1-7 (committed at `ece6eccf2`)
|
||||
- `RESUME_PLAN_RELEASE.md` — Phase A (committed at `542aaf95d`)
|
||||
- this doc — Phase D/E/F
|
||||
|
||||
---
|
||||
|
||||
## Current state
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **Worktree** | `/Users/VJ/GitHub/MLSysBook-massive-build` |
|
||||
| **Branch** | `feat/massive-build-2026-04-25-run` |
|
||||
| **HEAD** | `e7cd3b24c feat(vault): Phase B + C — 144 PASS items added (B.5: 110, C.4: 34)` |
|
||||
| **Bundle** | 9,688 published (was 9,224 at branch cut, +464 net) |
|
||||
| **All gates** | green (`vault check --strict`, lint, doctor, codegen, validate-vault, render) |
|
||||
|
||||
---
|
||||
|
||||
## What's already done (do NOT redo)
|
||||
|
||||
### Phase A (commit `542aaf95d`)
|
||||
- 3 structural Pydantic validators added: `visual.path-resolves`, `_zone_bloom_compatible`, `disk-coverage`
|
||||
- Lint calibration via 4-expert consensus (1,308 → 0 warnings)
|
||||
- Registry repaired (5,269 IDs appended), doctor split into `disk-coverage` (HARD) + `registry-history` (INFO)
|
||||
- Chain integrity full pass (0 errors / 0 warnings)
|
||||
- Practice page zoom modal + 9th Playwright test
|
||||
|
||||
### Phase B (in commit `e7cd3b24c`)
|
||||
- Generator hardened: `bloom_for_zone_level()` respects ZONE_BLOOM_AFFINITY, prompt requires `bloom_level` field, lists 13 canonical competency_areas inline, demands L5/L6+ depth (no trivial division framings).
|
||||
- **Validate-at-write**: every Gemini-emitted YAML round-trips through `Question.model_validate()` before disk write.
|
||||
- B.5 loop saturated at iter 4 on `DROP rate 38.3% > 35%` (judge tightening on L6+ depth, not budget). Yield: 110 PASS in 26 calls.
|
||||
|
||||
### Phase C (in commit `e7cd3b24c`)
|
||||
- 120 NEEDS_FIX items from prior session re-edited via fix-agent (92 edited, 28 already-resolved).
|
||||
- Re-judge: 67 of 92 judged → 34 PASS / 13 NEEDS_FIX / 20 DROP. 34 PASS promoted.
|
||||
|
||||
### Saturation reasons (carry-forward signal)
|
||||
- B.5: `DROP rate 38.3% exceeds 35% — likely hallucination`. Judge rejects nearly half of L6+ depth items even with the strengthened prompt. Adding more API calls won't help; deeper prompt scaffolding will.
|
||||
- C.3: 25 of 92 items unjudged (max-calls=5 chunk cap).
|
||||
|
||||
---
|
||||
|
||||
## What's still open (Phase D/E/F)
|
||||
|
||||
### Three priority gaps that remain
|
||||
| Gap | Current | Expected | Status |
|
||||
|---|---|---|---|
|
||||
| `tinyml/parallelism` (area-level) | 1 | ~95 | **never closed** |
|
||||
| `mobile/parallelism` (area-level) | 0 | ~134 | **never closed** |
|
||||
| `edge/parallelism` (area-level) | 13 | ~159 | barely moved |
|
||||
| `global/realization/L4-L6+` | 0 | ~14 | empty |
|
||||
| `global/specification/L6+` | 0 | ~5 | empty |
|
||||
| `global/mastery/L5` | 0 | ~5 | empty |
|
||||
|
||||
**Why prior runs didn't close them**: the analyzer's recommended_plan
|
||||
picks **topic-level** cells (queueing-theory, memory-hierarchy-design,
|
||||
etc.) by priority, but the parallelism gap aggregates across multiple
|
||||
parallelism-flavored topics (pipeline-parallelism,
|
||||
collective-communication, kv-cache-management, interconnect-topology).
|
||||
None of those individual topic cells crack the top-100 priority list, so
|
||||
the loop never targets them. Closing the area-level gap requires
|
||||
**hand-built topic targets**, bypassing the analyzer.
|
||||
|
||||
### Three carry-forwards from C.3
|
||||
- 25 unjudged items — max-calls cap left them on the table
|
||||
- 13 still-NEEDS_FIX after one fix attempt — second fix pass possible
|
||||
- 20 DROP items — could be salvaged with a deeper rewrite
|
||||
|
||||
---
|
||||
|
||||
## Phases D + E + F
|
||||
|
||||
### Phase D — Priority gap closure (THE mission, finally)
|
||||
|
||||
| ID | Task | Acceptance | Effort |
|
||||
|---|---|---|---|
|
||||
| D.1 | Hand-author **~50 parallelism targets** as `track:topic:zone:level` strings. Topics: `pipeline-parallelism`, `collective-communication`, `kv-cache-management`, `interconnect-topology`. Tracks: edge/mobile/tinyml at L4-L6+. Skip cloud (already dense). Save to `tools/phase_d/parallelism_targets.txt`. | File written, ≥40 cells, all 4 topics represented | 30 min |
|
||||
| D.2 | Author a **parallelism-specific prompt variant** in the generator. Adds these rules: (a) forbid bandwidth-division framings (`payload / bandwidth`); (b) require concrete topology (NVLink/IB/PCIe/RoCE/LoRa) appropriate to the track; (c) require a synchronization or bubble cost in the question; (d) require non-trivial system integration. Toggle via `--prompt-variant parallelism` CLI flag. | Manual test: feed 5 cells, judge ≥3 of 5 PASS at high confidence | 1.5 hr |
|
||||
| D.2' | **REVIEW CHECKPOINT** — surface prompt + 5 sample drafts for user review before D.3 burns API budget | User signs off | — |
|
||||
| D.3 | Run focused loop (15-20 API calls, batch_size 30) targeting D.1's hand-built cells with `--prompt-variant parallelism` | Loop summary: ≥20 PASS items in parallelism cells | 2 hr wall clock |
|
||||
| D.4 | Spot-read all PASS items from D.3 (~30-50); reject any that read as bandwidth-math (manual edit to set `status: archived` or rewrite). Promote the rest. | All promoted items have non-trivial framings | 30 min |
|
||||
| D.5 | Same mechanism for **global L4-L6+**: hand-author ~20 cells, run focused loop with **standard prompt** (global cells aren't parallelism-flavored, just under-filled). | ≥10 global L4-L6+ PASS items | 2 hr wall clock |
|
||||
| D.6 | Promote, rebuild bundle, regen paper artifacts | `vault check --strict` clean; published count up by 30-60 | 30 min |
|
||||
|
||||
**Phase D total**: ~7 hr work, ~5 hr wall clock, ~30-40 API calls.
|
||||
|
||||
### Phase E — Generator efficiency (compounding leverage)
|
||||
|
||||
| ID | Task | Acceptance | Effort | Saves |
|
||||
|---|---|---|---|---|
|
||||
| E.1 | **Retry-on-validation-fail** in `gemini_cli_generate_questions.py`. If `Question.model_validate()` rejects, single retry with prompt suffix `"your previous JSON had these violations: <list>. Re-emit only the failed items, fixed."` Second failure logs structured error and skips. | Unit test: feed bad dict → script retries once, recovers | 45 min | ~50% of API calls (B.5's iter 1 + iter 3 lost 8 of 26 = 31%) |
|
||||
| E.2 | **Auto-update vault-manifest.json from `vault build`**. Currently maintained by hand; pre-commit caught the gap twice this session. | `vault build --legacy-json` writes a fresh manifest with current counts + hash | 30 min | Manifest-stale failures eliminated |
|
||||
| E.3 | **Tighten the analyzer**: add `--include-areas parallelism,networking` flag so the recommended_plan can include cells weighted by track×area gap (not only track×topic gap). Solves the structural issue that drove D.1's hand-authoring. | Run with `--include-areas parallelism` returns plan with ≥10 parallelism-topic cells | 1 hr | Future runs don't need D.1's hand-build step |
|
||||
|
||||
**Phase E total**: ~2.5 hr.
|
||||
|
||||
### Phase F — Residual cleanup (completeness)
|
||||
|
||||
| ID | Task | Acceptance | Effort |
|
||||
|---|---|---|---|
|
||||
| F.1 | **Re-judge the 25 unjudged items** from C.3. Use the same fix-agent-edited paths from `tools/phase_c/needs_fix_manifest.json`. | 25 items judged; promote any flipped to PASS | 20 min |
|
||||
| F.2 | **Second-pass fix-agent** on remaining 13 NEEDS_FIX + 20 DROP from C.3. Spawn `general-purpose` agent with the C.3 judge's verdicts as input. | Each item edited; re-judged; promote flipped | 1 hr |
|
||||
| F.3 | **Spot-read 20 PASS items** stratified across this push's promotions (Phase B + C combined = 144 items). Rejection bar: shallow framings, math errors, hardware-spec inaccuracies. | Reviewed list saved; rejection rate ≤ 10% | 1 hr |
|
||||
|
||||
**Phase F total**: ~2.5 hr.
|
||||
|
||||
---
|
||||
|
||||
## Parallelism map (what can run concurrently)
|
||||
|
||||
The cleanest interleaving:
|
||||
|
||||
```
|
||||
Stage 1 — sequential prep (no API) ~3 hr
|
||||
D.1 (hand-build targets)
|
||||
└── D.2 (parallelism prompt)
|
||||
└── E.1 (retry-on-validate-fail)
|
||||
└── E.2 (auto-manifest)
|
||||
└── E.3 (analyzer flag)
|
||||
└── (D.2' user review)
|
||||
|
||||
Stage 2 — parallel execution ~2 hr wall clock
|
||||
D.3 (parallelism loop, 15-20 calls) ━┓
|
||||
┣━ both write disjoint IDs
|
||||
F.2 (fix-agent on 33 items) ━┛ no race risk
|
||||
|
||||
Stage 3 — parallel execution ~2 hr wall clock
|
||||
D.5 (global loop, 10-15 calls) ━┓
|
||||
┣━ all disjoint
|
||||
F.1 (re-judge 25 unjudged) ━┫
|
||||
┃
|
||||
F.3 (spot-read first 10 of 20) ━┛ read-only
|
||||
|
||||
Stage 4 — sequential finalize ~1 hr
|
||||
D.4 (parallelism spot-read + promote)
|
||||
└── D.6 (rebuild bundle, regen paper)
|
||||
└── F.3 (finish spot-read second 10)
|
||||
└── final commit
|
||||
```
|
||||
|
||||
**Total wall clock**: ~8 hr (vs ~10-12 hr serial).
|
||||
|
||||
**API budget**: ~30-40 calls expected (Gemini cap is 250/day; today used ~76, so ~174 remaining).
|
||||
|
||||
### Parallelism safety rules
|
||||
|
||||
1. **No two generation loops concurrent** — both call `next_id_for_track()` which is filesystem-stat-based; concurrent calls can race on the next ID. D.3 must finish before D.5 starts.
|
||||
2. **Generation loop + fix-agent OK** — disjoint ID ranges (loop writes new, agent edits existing).
|
||||
3. **Generation loop + judge OK** — judge reads files, doesn't write to questions/.
|
||||
4. **No schema changes during loops** — schema changes invalidate validate-at-write contract mid-stream.
|
||||
|
||||
---
|
||||
|
||||
## Locked decisions (do NOT relitigate)
|
||||
|
||||
| Decision | Choice |
|
||||
|---|---|
|
||||
| **Release tag** | One stable dev branch, no mid-stream release tag (per prior plan) |
|
||||
| **Bloom canonical** | When zone-bloom conflict, trust bloom; reclassify zone via `BLOOM_CANONICAL_ZONE` |
|
||||
| **Validate-at-write severity** | ERROR (Pydantic hard-rejects), not WARN |
|
||||
| **D.2 prompt authorship** | Claude drafts, user reviews at D.2' |
|
||||
| **Test-first for E.x** | Unit tests before real API calls (cheaper failure mode) |
|
||||
|
||||
---
|
||||
|
||||
## Review checkpoints
|
||||
|
||||
1. **D.2'** — surface parallelism prompt + 5 sample drafts for user review before D.3 fires the loop.
|
||||
2. **D.4** — surface PASS items for spot-read; user can flag any that read shallow.
|
||||
3. **Final** — surface all gates green + commit summary.
|
||||
|
||||
---
|
||||
|
||||
## Common saturation outcomes for D.3 / D.5
|
||||
|
||||
If D.3 stops early:
|
||||
|
||||
| Reason | Meaning | What to do |
|
||||
|---|---|---|
|
||||
| `DROP rate > 35%` | Judge rejecting parallelism items as too shallow | Inspect the latest iter's `judge_summary.json` — if rejections are about "trivial topology" framings, tighten D.2 prompt further. If about correctness errors, accept the saturation. |
|
||||
| `same top cell two iters` | Generator can't fill | Hit budget cap; move on, document as ceiling |
|
||||
| `max-calls reached` | Burned through API budget | Stop. Commit what we have. |
|
||||
| `0 drafts produced` | Validate-at-write rejected entire batch | E.1's retry should have prevented this; if it persists, dump the prompt and inspect Gemini's raw output |
|
||||
|
||||
---
|
||||
|
||||
## What NOT to do
|
||||
|
||||
- ❌ Don't merge to `dev` until all gates green AND user explicitly OKs.
|
||||
- ❌ Don't push to remote without explicit user OK.
|
||||
- ❌ Don't run two generation loops concurrently (next-id race).
|
||||
- ❌ Don't add `Co-Authored-By` lines or automated attribution footers.
|
||||
- ❌ Don't change ZONE_BLOOM_AFFINITY or schema enum values without explicit user direction.
|
||||
- ❌ Don't auto-promote NEEDS_FIX without re-judge.
|
||||
- ❌ Don't suppress lint warnings or skip pre-commit hooks (`--no-verify` forbidden).
|
||||
- ❌ Don't auto-cut a release tag (`v0.1.2`) — single stable commit is the goal.
|
||||
- ❌ Don't navigate to or modify files in sibling worktrees.
|
||||
|
||||
---
|
||||
|
||||
## Files of interest
|
||||
|
||||
| File | Why |
|
||||
|---|---|
|
||||
| `interviews/vault/docs/RESUME_PLAN_2026-04-25.md` | Phase 1-7 history |
|
||||
| `interviews/vault/docs/RESUME_PLAN_RELEASE.md` | Phase A history |
|
||||
| `interviews/vault/docs/MASSIVE_BUILD_RUNBOOK.md` | Methodology document |
|
||||
| `interviews/vault/_validation_results/coverage_loop/20260425_192956/` | Most recent loop output (B.5) — judge_summary.json per iter, NEEDS_FIX details with fix_suggestions |
|
||||
| `interviews/vault/_validation_results/phase_c_rejudge/judge_summary.json/20260425_201121/summary.json` | C.3 re-judge verdicts |
|
||||
| `tools/phase_c/needs_fix_manifest.json` | The 120-item NEEDS_FIX queue (the 13 still-pending + 20 DROP go here for F.2) |
|
||||
| `tools/phase_b/cell_triage.json` | The 14 L6+/L5-deep cells (a subset of what D.2's prompt should target) |
|
||||
| `interviews/vault/scripts/gemini_cli_generate_questions.py` | **D.2 + E.1 edit here.** |
|
||||
| `interviews/vault/scripts/analyze_coverage_gaps.py` | **E.3 edits here.** |
|
||||
| `interviews/vault-cli/src/vault_cli/commands/build.py` (or equivalent) | **E.2 edits here** to write the manifest. |
|
||||
| `interviews/vault/schema/enums.py` | ZONE_BLOOM_AFFINITY + BLOOM_CANONICAL_ZONE + widened ZONE_LEVEL_AFFINITY (do not edit lightly) |
|
||||
|
||||
---
|
||||
|
||||
## One-liner status check (run first in next session)
|
||||
|
||||
```bash
|
||||
cd /Users/VJ/GitHub/MLSysBook-massive-build && \
|
||||
git log --oneline -3 && echo "---" && \
|
||||
git status --short | head -5 && echo "---" && \
|
||||
PYTHONPATH=interviews/vault-cli/src \
|
||||
python3 -m vault_cli.main check --strict 2>&1 | tail -2 && \
|
||||
echo "---" && \
|
||||
PYTHONPATH=interviews/vault-cli/src \
|
||||
python3 -m vault_cli.main lint interviews/vault/questions/ 2>&1 | tail -2 && \
|
||||
echo "---" && \
|
||||
PYTHONPATH=interviews/vault-cli/src \
|
||||
python3 -m vault_cli.main doctor 2>&1 | grep -cE "fail" | xargs -I{} echo "doctor fails: {}" && \
|
||||
echo "---" && \
|
||||
python3 -c "
|
||||
import json
|
||||
c = json.load(open('interviews/staffml/src/data/corpus.json'))
|
||||
print(f'published: {len(c)}')
|
||||
"
|
||||
```
|
||||
|
||||
If output shows commit `e7cd3b24c`, clean tree, vault check passes,
|
||||
0 lint warnings, 0 doctor fails, 9,688 published — the resume state
|
||||
matches this plan's starting assumptions. **Proceed to D.1.**
|
||||
|
||||
If anything differs, **stop and reconcile** before any code edits.
|
||||
|
||||
---
|
||||
|
||||
## Pacing
|
||||
|
||||
This is a ~12-15 hour push compressed to ~8 hr wall clock by the
|
||||
parallelism map. Plausibly two focused sessions, or one long one.
|
||||
|
||||
The biggest risk is D.3 saturating at low yield (<10 parallelism PASS
|
||||
items). If that happens, D.5 becomes the only material content gain
|
||||
of this push, and the parallelism gap stays open as a documented
|
||||
limitation rather than a closed mission. That is acceptable — the
|
||||
branch was already StaffML-day-ready before Phase D started.
|
||||
|
||||
The smallest budget commitment is Phase E (no API calls; pure
|
||||
generator infra). If only one phase fits, do E — it compounds for
|
||||
every future generation run, while D is a one-time content gain.
|
||||
|
||||
Three explicit user-review checkpoints (D.2', D.4, final). Wait for
|
||||
sign-off at each before continuing.
|
||||
@@ -1,314 +0,0 @@
|
||||
# Resume Plan — Release-Ready Cleanup + Balanced Generation (2026-04-25)
|
||||
|
||||
**Purpose:** hand the next Claude session everything it needs to take
|
||||
`feat/massive-build-2026-04-25-run` from "ships with caveats" to
|
||||
"stable dev branch ready for StaffML day."
|
||||
|
||||
**Companion doc:** `interviews/vault/docs/RESUME_PLAN_2026-04-25.md`
|
||||
(the prior session's plan — completed through Phase 7, commit `ece6eccf2`).
|
||||
|
||||
---
|
||||
|
||||
## Current state
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **Worktree** | `/Users/VJ/GitHub/MLSysBook-massive-build` |
|
||||
| **Branch** | `feat/massive-build-2026-04-25-run` (off `feat/massive-build-2026-04-25` in `vault-audit`) |
|
||||
| **HEAD** | `ece6eccf2 feat(vault): massive build — 630 drafts generated, 320 PASS promoted, paper 0.1.1` |
|
||||
| **Parent branch** | `feat/massive-build-2026-04-25` in `MLSysBook-vault-audit`, untouched |
|
||||
|
||||
**`dev` has advanced** since this branch was cut (was `4a7c64585`, now
|
||||
`72a741aa1`). Future merge to `dev` will need rebase or merge resolution.
|
||||
**Do not merge yet** — finish the cleanup + balanced generation first.
|
||||
|
||||
---
|
||||
|
||||
## What's already done (do NOT redo)
|
||||
|
||||
### From commit `ece6eccf2` (this session, 2026-04-25)
|
||||
|
||||
- 6-iter Gemini coverage loop ran; 50 of 80 API calls used.
|
||||
- **630 drafts generated**, **320 PASS promoted** to published.
|
||||
- Bundle: `9,224 → 9,544 published` (+320 exact).
|
||||
- 234 visual assets mirrored to `staffml/public/question-visuals/`.
|
||||
- Paper artifacts refreshed against new `0.1.1` release
|
||||
(`release_hash: 0350da5706e6`); `paper.pdf` compiles to 25 pages.
|
||||
- Loop defaults bumped: `max-iters 30`, `max-calls 80`, `batch 30`,
|
||||
`calls/iter 4`, `judge-chunk 25`.
|
||||
- `fix_competency_areas.py` REMAP table extended with 30+ new patterns
|
||||
(zones-as-area, bloom-verbs-as-area, underscore hallucinations,
|
||||
dash/slash track-prefix forms). All 462 malformed drafts now canonical.
|
||||
- `vault-manifest.json` refreshed: questionCount 9,224 → 9,544,
|
||||
contentHash 539eb877f9cc → 0350da5706e6.
|
||||
- All 8 Playwright tests pass.
|
||||
|
||||
### Saturation reason (carry-forward signal)
|
||||
|
||||
`same top-priority cell two iterations in a row — converged`. Top
|
||||
priority decay 2.25 → 2.14 → 2.03 → 1.93 → 1.83 plateaued. Both halt
|
||||
conditions (gap-threshold 0.8, max-calls 80) had headroom remaining;
|
||||
**structural convergence fired first**. Generator cannot meaningfully
|
||||
shrink `tinyml/specification/L6+` further within the current prompt
|
||||
framing. **This is the central problem Phase B addresses.**
|
||||
|
||||
---
|
||||
|
||||
## Audit findings from this session (so the next session does not rediscover)
|
||||
|
||||
### 1. Distribution closure — PARTIAL FAILURE
|
||||
|
||||
The 320 PASS items did NOT close the priority gaps the analyzer flagged:
|
||||
|
||||
| Targeted gap | Before | After | Δ | % gap closed |
|
||||
|---|---|---|---|---|
|
||||
| tinyml/parallelism | 0 | 1 | +1 | 1% |
|
||||
| tinyml/networking | 2 | 11 | +9 | 10% |
|
||||
| **mobile/parallelism** | **0** | **0** | **+0** | **0%** |
|
||||
| edge/parallelism | 11 | 13 | +2 | 1% |
|
||||
| global L4–L6+ | 189 | 189 | +0 | 0% |
|
||||
|
||||
Where they actually landed: `mobile/memory` (16), `mobile/networking` (15),
|
||||
`tinyml/cross-cutting` (13), `tinyml/power` (13), `mobile/data` (13). All
|
||||
useful, none on the original priority list. **Phase B's job is to close
|
||||
the actual targeted cells** with prompt templates engineered for the
|
||||
content type, not just more API calls.
|
||||
|
||||
Why parallelism failed: judge DROPped most parallelism drafts as
|
||||
"too-shallow framing" (e.g., `cloud-4490` verdict: *"Simple division of
|
||||
payload by bandwidth is too trivial for L6+ Staff level"*). The fix is
|
||||
**template-level, not budget-level**.
|
||||
|
||||
### 2. Schema completeness — STRONG (with one defect)
|
||||
|
||||
- 320/320 PASS items have full `details.{realistic_solution,
|
||||
common_mistake, napkin_math}` ✓
|
||||
- 135/136 visual references resolve to real SVG ✓
|
||||
- 1 defect: **`mobile-1962`'s graphviz render crashed silently** —
|
||||
only `.dot` source exists, no `.svg`. Judge passed it because YAML
|
||||
was structurally valid. `render_visuals.py` does not propagate
|
||||
failures.
|
||||
|
||||
### 3. Quality at scale — ~7.5/10 average across 10 stratified items
|
||||
|
||||
Strong: `edge-2431` (Jetson NvSciBuf zero-copy), `tinyml-1658` (256KB
|
||||
SRAM cliff diagnosis), `mobile-1923` (UFS write-amplification),
|
||||
`tinyml-1635` (closed-form duty-cycle), `edge-2313` (Hailo-8 PCIe
|
||||
pipeline bubble). Math correct in all 10. Real hardware grounding in
|
||||
all 10.
|
||||
|
||||
Weak: `edge-2423` (asks for "standard programming pattern" — too
|
||||
generic, OS-textbook style).
|
||||
|
||||
### 4. All-checks audit
|
||||
|
||||
| Gate | Result |
|
||||
|---|---|
|
||||
| `vault check --strict` | ✓ 0 errors / 0 invariant failures |
|
||||
| `vault doctor / release-integrity` | ✓ 0.1.1 verified |
|
||||
| `vault doctor / content-hash-sample` | ✓ 20/20 sampled hashes match |
|
||||
| `vault doctor / registry-integrity` | ✗ 5,269 missing from registry; 4,479 registry orphans |
|
||||
| `vault lint` | 0 errors / **1,308 warnings** (all `zone-level-affinity`; 303 on new items, 1,005 pre-existing) |
|
||||
| Playwright (8 tests) | ✓ all pass |
|
||||
| Pre-commit hook | ✓ (after manifest refresh) |
|
||||
|
||||
**Registry drift forensics (resolved cause; not a worktree issue):**
|
||||
Registry is identical across all 3 worktrees (MD5 `a9a259c559cc23b03ca371683ad81d6d`).
|
||||
The 4,479 orphan registry entries are old cohort-tagged IDs
|
||||
(`tinyml-exp2-desi-0184`, `cloud-fill-04027`, `tinyml-cell-13251`)
|
||||
left over from commit `8a5c3ff3c`'s rename refactor that updated YAMLs
|
||||
but never appended to the registry. The 5,269 disk orphans are: 4,754
|
||||
renamed-INTO clean IDs + 320 from this session + ~195 prior-run
|
||||
unappended items. **94% of the drift pre-existed this session.**
|
||||
|
||||
---
|
||||
|
||||
## Locked decisions (do NOT relitigate)
|
||||
|
||||
| Decision | Choice |
|
||||
|---|---|
|
||||
| **A.6 lint calibration** | Spawn 4 expert agents on a stratified sample of disputed (zone, level) pairs; consolidate via `consensus-builder`; widen rule for accepted pairs, reclassify items in rejected pairs, ack-list disputed pairs. Must hit **0 lint warnings** before proceeding. |
|
||||
| **A.7 chain integrity** | Fix the data — full pass on the 29 single-question chains + 101 non-sequential. Not the relaxation shortcut. |
|
||||
| **A.8 zoom UX** | `react-medium-image-zoom` (4KB, click-to-zoom modal, ESC closes). Lightest + most responsive. |
|
||||
| **B.3 prompt authorship** | Claude drafts; user reviews before B.5 fires the loop. |
|
||||
| **Release cadence** | One stable dev branch at the end. No mid-stream release tags. The user's framing: *"I just want the dev branch to come to a stable point for StaffML day."* |
|
||||
|
||||
---
|
||||
|
||||
## Review checkpoints (pause for user input)
|
||||
|
||||
1. **After A.6.3 expert consensus lands** — before applying calibration to the lint rule.
|
||||
2. **After B.3 prompt drafts are written** — before B.5 fires the
|
||||
generation loop and burns API budget.
|
||||
3. **Before D.2** — final atomic commit; user confirms branch is
|
||||
stable-state ready.
|
||||
|
||||
---
|
||||
|
||||
## Phase A — Cleanup (sequential, blocking everything else; ~7-8 hr)
|
||||
|
||||
| ID | Task | Acceptance criterion | Effort |
|
||||
|---|---|---|---|
|
||||
| A.1 | Re-run `render_visuals.py` for `mobile-1962`; if graphviz still crashes, fix `.dot` source or strip the `visual:` block | `interviews/vault/visuals/mobile/mobile-1962.svg` exists OR YAML's visual block removed | 10 min |
|
||||
| A.2 | `render_visuals.py`: non-zero exit on any per-item crash; capture per-ID stderr to `_validation_results/render_failures.json` | Inject a broken `.dot` test; confirm exit code != 0 + log written | 30 min |
|
||||
| A.3 | LinkML schema: type the `visual` block as a structured sub-schema. `kind` enum `[svg, png]`, `path` regex `^[a-z0-9-]+\.(svg\|png)$`, required `alt` (≥10 chars) + `caption` (≥5 chars) | LinkML codegen produces typed `Visual` class; existing 234 visual items still validate | 45 min |
|
||||
| A.4 | Pydantic field-validator: `visual.path` MUST resolve to a real file in `visuals/<track>/`; reject otherwise | Unit test: YAML with `visual.path: nonexistent.svg` fails `Question.model_validate()` | 30 min |
|
||||
| A.5 | Registry repair: write `tools/repair_registry.py` reading disk → appending 5,269 missing IDs as `created_by: registry-rebuild-2026-04-25`. Add comment block above the new entries documenting the rename history. Refactor `doctor.py:_check_registry_integrity` into two checks: `disk-coverage` (HARD FAIL if disk file unregistered) and `registry-history` (INFO only for retired IDs). | `vault doctor` shows `disk-coverage: pass`; `registry-history: info`. Registry is append-only (no deletions). | 1 hr |
|
||||
| A.6 | **Expert-driven lint calibration** (replaces the original "empirical widen" version). See A.6.* breakdown below. | `vault lint interviews/vault/questions/` reports 0 errors / **0 warnings** | 2 hr |
|
||||
| A.7 | Chain integrity: 29 single-question chains + 101 chains with non-sequential positions. Audit each → fix the chain (renumber positions / extend with siblings) or drop the chain entirely. | Pre-commit hook reports 0 chain warnings | 1.5 hr |
|
||||
| A.8 | Practice page: render visual inline beside question + click-to-zoom modal using `react-medium-image-zoom`. Add Playwright test: load known-visual question, click image, verify modal opens, press ESC, verify modal closes. | Playwright count 8 → 9, all pass | 1.5 hr |
|
||||
| A.9 | Cleanup verification gate: `vault check --strict` 0 errors • `vault lint` 0 warnings • `vault doctor` 0 fails • Playwright 9/9 • all 320 prior PASS items still in corpus | All five gates green | 15 min |
|
||||
| A.10 | Atomic commit: `cleanup(vault): registry repair + visual schema + lint calibration + zoom UI` | Pre-commit hook passes without `--no-verify` | 5 min |
|
||||
|
||||
### A.6 expanded — expert-driven lint calibration
|
||||
|
||||
| Step | Action | Acceptance |
|
||||
|---|---|---|
|
||||
| A.6.1 | Pull all 1,308 zone-level-affinity warns; group by (zone, level) pair; pick 3-5 representative questions per disputed pair as evidence | Manifest file `tools/lint_calibration_evidence.yaml` with ~30-50 disputed-pair samples |
|
||||
| A.6.2 | Spawn 4 expert agents in **parallel**: `expert-vijay-reddi`, `expert-chip-huyen`, `expert-jeff-dean`, `education-reviewer`. Each gets the same disputed-pair manifest + the question: *"for each (zone, level) pair, is it pedagogically valid? give your reasoning."* | 4 expert reports written to `.claude/_reviews/lint-calibration-<ts>/` |
|
||||
| A.6.3 | **(USER REVIEW CHECKPOINT 1)** — surface the four expert reports for user review before consolidation | User signs off |
|
||||
| A.6.4 | Consolidate via `consensus-builder` agent: every (zone, level) pair gets a verdict: `accepted` (≥3 experts say valid), `rejected` (≥3 say invalid), `disputed` (split) | Consensus report with verdict per pair |
|
||||
| A.6.5 | For `accepted` pairs → widen lint rule. For `rejected` pairs → reclassify the affected questions (update zone or level field, vault check still passes). For `disputed` pairs → ack-list with rationale. | Updated `zone_level_affinity.yaml` rule + reclassified items committed |
|
||||
| A.6.6 | Re-run `vault lint interviews/vault/questions/` → must report **0 warnings, 0 errors** | Strict pass |
|
||||
|
||||
---
|
||||
|
||||
## Phase B — Full balanced generation (after A.10 lands; ~9-10 hr)
|
||||
|
||||
The original Phase 1 analyzer flagged 100 cells. The first run hit
|
||||
~30 of those and PASS-ed at unusual cells (mobile/memory etc.) rather
|
||||
than the priority cells. Phase B systematically attacks the full
|
||||
list with prompts engineered for the actual content type needed.
|
||||
|
||||
| ID | Task | Acceptance criterion | Effort |
|
||||
|---|---|---|---|
|
||||
| B.1 | Re-run analyzer against current corpus (post-cleanup): get fresh 100-cell recommended plan | Plan file written; top 20 inspected | 5 min |
|
||||
| B.2 | Cell-class triage: read the 100 cells, group by failure mode the first run revealed: `parallelism-too-shallow`, `global-L6+-too-abstract`, `healthy-fillable`. Each class gets its own prompt template. | `tools/cell_triage.md` written: list of cells × class × prompt-template ref | 1 hr |
|
||||
| B.3 | Author **3 specialized generator prompts**, one per failure class. **Parallelism** prompt: requires concrete topology (NVLink, IB, PCIe, RoCE, LoRa), forbids pure bandwidth division, requires synchronization or bubble cost in the question. **Global-L6+** prompt: requires cross-track synthesis (e.g., compare same constraint in tinyml + cloud), forbids generic abstractions. **Standard** prompt: refined version of current with validate-at-write fix. | 3 prompt files in `interviews/vault/scripts/prompts/`; test invocation against each produces 5 sample drafts that pass judge | 2 hr |
|
||||
| B.3' | **(USER REVIEW CHECKPOINT 2)** — surface prompt drafts for user review before B.5 | User signs off | — |
|
||||
| B.4 | Add validate-at-write to `gemini_cli_generate_questions.py`: every YAML round-trips through `Question.model_validate()` before write. Failures → retry once with "your previous output had X violations" prompt. Second failure → log structured error and skip. **This is the root-cause fix for the competency_area regression.** | Unit test: feed Gemini-style malformed dict → script rejects, retries, eventually skips with structured error | 1 hr |
|
||||
| B.5 | Two-stage loop: Stage 1 — 30-call run targeting all 100 cells with appropriate prompt class, batch_size 30 → ~900 drafts. Stage 2 — judge in chunks of 25; re-judge any NEEDS_FIX after one auto-fix retry pass. | Loop summary shows: drafts ≥ 800, PASS rate ≥ 60%, items in priority cells (parallelism + global L6+) ≥ 80 | 4-5 hr wall clock, 50-70 calls |
|
||||
| B.6 | Stratified spot-read: 20 items across (track × prompt-class × verdict). Reject drafts that read as bandwidth-math or "standard programming pattern." | Reviewed list saved; rejection rate ≤ 15% | 30 min |
|
||||
| B.7 | Promote PASS items, rebuild bundle, regen paper macros, recompile PDF | `vault check --strict` clean; corpus published count grows by 200-500; macros stamped | 30 min |
|
||||
|
||||
---
|
||||
|
||||
## Phase C — NEEDS_FIX queue (parallel with B.5/B.6 once A.10 lands; ~2.5 hr)
|
||||
|
||||
This run's 120 NEEDS_FIX items each carry a specific `fix_suggestion`
|
||||
from the judge (see `_validation_results/coverage_loop/20260425_150712/iter_*/judge_summary.json`).
|
||||
|
||||
| ID | Task | Acceptance | Effort |
|
||||
|---|---|---|---|
|
||||
| C.1 | Aggregate the 120 NEEDS_FIX from this run + any new from Phase B into a single fix manifest with per-item `fix_suggestion` + criteria flags | Manifest file written, ≥120 entries | 15 min |
|
||||
| C.2 | Spawn `general-purpose` fix-agent with `quiz-generation.md` as quality bar; agent edits each YAML in place applying the judge's specific suggestion | Each YAML modified; `vault check --strict` still passes | 1.5 hr |
|
||||
| C.3 | Re-judge fixed items in a small chunked run (~3-5 calls) | Verdict distribution recorded | 30 min |
|
||||
| C.4 | Promote any items that flipped to PASS | Promoted count logged | 5 min |
|
||||
|
||||
**Concurrency safety:** Phase C touches *existing* NEEDS_FIX YAMLs;
|
||||
Phase B writes *new* IDs. Different ID ranges → no write race. Both
|
||||
phases must NOT run while Phase A is in flight (schema/lint changes).
|
||||
|
||||
---
|
||||
|
||||
## Phase D — Final stable state (after B + C; ~1 hr)
|
||||
|
||||
| ID | Task | Acceptance | Effort |
|
||||
|---|---|---|---|
|
||||
| D.1 | Re-run all gates: `vault check --strict` • `vault lint` (0 warnings) • `vault doctor` (0 fails) • Playwright (9/9) • paper compile (0 LaTeX errors) • registry append-only invariant verified. Wrap as `tools/release_gate.sh`. | Single shell script returns exit 0 | 30 min |
|
||||
| D.2 | **(USER REVIEW CHECKPOINT 3)** — surface final state to user. | User signs off | — |
|
||||
| D.3 | Atomic final commit: `feat(vault): release-ready cleanup + balanced generation` | Pre-commit clean; branch ready for StaffML day | 10 min |
|
||||
|
||||
---
|
||||
|
||||
## Common saturation outcomes (mirroring prior plan)
|
||||
|
||||
If Phase B's loop stops early:
|
||||
|
||||
| Reason | Meaning | What to do |
|
||||
|---|---|---|
|
||||
| `top priority gap < 0.8` | Corpus is balanced enough no cell desperately empty | Success. Move to B.6. |
|
||||
| `DROP rate > 35%` | Gemini hallucinating or cells nonsensical | Inspect latest iter `judge_summary.json`; add to `TRACK_TOPIC_BLOCKLIST` in `analyze_coverage_gaps.py`. Likely indicates a prompt template needs another revision. |
|
||||
| `same top cell two iters in a row` | Generator cannot fill the cell | Check raw Gemini output for that cell. Likely needs even more specialized prompt. **This is what fired in the prior run.** |
|
||||
| `max-iters reached` | Hit iteration cap before saturation | Re-run with higher `--max-iters 50` if budget allows. |
|
||||
| `max-calls reached` | Burned through API budget | Stop. Ship Phase C first. |
|
||||
|
||||
---
|
||||
|
||||
## What NOT to do
|
||||
|
||||
- ❌ Don't merge to `dev` until Phase D passes (pre-commit hook + all gates green).
|
||||
- ❌ Don't push to remote without explicit user OK.
|
||||
- ❌ Don't run Phase B or C concurrent with Phase A in-flight.
|
||||
- ❌ Don't add `Co-Authored-By` lines or automated attribution footers.
|
||||
- ❌ Don't change schema enum values (CompetencyArea, Track, Level, Zone, Status, Provenance) without explicit user direction.
|
||||
- ❌ Don't auto-promote NEEDS_FIX items without re-judge.
|
||||
- ❌ Don't suppress lint warnings or skip pre-commit hooks (`--no-verify` forbidden).
|
||||
- ❌ Don't relitigate the locked decisions above without explicit user direction.
|
||||
- ❌ Don't navigate to or modify files in sibling worktrees (`MLSysBook`, `MLSysBook-vault-audit`, `MLSysBook-404`, `MLSysBook-labs-release`). Stay in `MLSysBook-massive-build`.
|
||||
- ❌ Don't auto-cut a release tag (`v0.1.2` etc.) — single stable commit is the goal, not a release ceremony.
|
||||
|
||||
---
|
||||
|
||||
## Files of interest
|
||||
|
||||
| File | Why |
|
||||
|---|---|
|
||||
| `interviews/vault/docs/RESUME_PLAN_2026-04-25.md` | Prior session's plan (completed through Phase 7). |
|
||||
| `interviews/vault/docs/MASSIVE_BUILD_RUNBOOK.md` | Methodology document — the prior session's runbook. |
|
||||
| `interviews/vault/_validation_results/coverage_loop/20260425_150712/` | Last loop's per-iter judge_summary.json (PASS/NEEDS_FIX/DROP details with fix_suggestion). |
|
||||
| `interviews/vault/scripts/iterate_coverage_loop.py` | Main driver. Defaults bumped this session. |
|
||||
| `interviews/vault/scripts/analyze_coverage_gaps.py` | Priority ranking. |
|
||||
| `interviews/vault/scripts/gemini_cli_generate_questions.py` | Batched Gemini generation. **Phase B.4 adds validate-at-write here.** |
|
||||
| `interviews/vault/scripts/gemini_cli_llm_judge.py` | Multi-criteria validator. |
|
||||
| `interviews/vault/scripts/render_visuals.py` | DOT/matplotlib → SVG. **Phase A.2 fixes silent-failure mode here.** |
|
||||
| `interviews/vault/scripts/fix_competency_areas.py` | One-time cleanup. REMAP table extended this session. |
|
||||
| `interviews/vault/scripts/promote_validated.py` | Lifecycle flip. |
|
||||
| `interviews/vault-cli/src/vault_cli/commands/doctor.py` | **Phase A.5 splits `_check_registry_integrity` into two checks.** |
|
||||
| `interviews/vault-cli/src/vault_cli/commands/lint.py` | **Phase A.6 updates `zone_level_affinity` rule.** |
|
||||
| `interviews/vault/id-registry.yaml` | Append-only ID log. **Phase A.5 appends 5,269 missing IDs.** |
|
||||
| `interviews/staffml/src/data/vault-manifest.json` | GUI's authoritative count. Refresh after every bundle build. |
|
||||
| `.claude/agents/expert-*.md` | Expert agent definitions for A.6.2. |
|
||||
| `.claude/agents/consensus-builder.md` | Consensus aggregator for A.6.4. |
|
||||
|
||||
---
|
||||
|
||||
## One-liner status check (run first in next session)
|
||||
|
||||
```bash
|
||||
cd /Users/VJ/GitHub/MLSysBook-massive-build && \
|
||||
git log --oneline -3 && echo "---" && \
|
||||
git status --short | head -10 && echo "---" && \
|
||||
PYTHONPATH=interviews/vault-cli/src \
|
||||
python3 -m vault_cli.main check --strict 2>&1 | tail -3 && \
|
||||
echo "---" && \
|
||||
PYTHONPATH=interviews/vault-cli/src \
|
||||
python3 -m vault_cli.main lint interviews/vault/questions/ 2>&1 | tail -3 && \
|
||||
echo "---" && \
|
||||
PYTHONPATH=interviews/vault-cli/src \
|
||||
python3 -m vault_cli.main doctor 2>&1 | tail -10 && \
|
||||
echo "---" && \
|
||||
python3 -c "
|
||||
import json
|
||||
c = json.load(open('interviews/staffml/src/data/corpus.json'))
|
||||
print(f'published: {len(c)}')
|
||||
"
|
||||
```
|
||||
|
||||
If the output shows commit `ece6eccf2`, clean tree, `vault check`
|
||||
passes, lint reports 1,308 warnings, doctor shows registry fail
|
||||
(5,269/4,479) — the resume state is healthy and matches this plan's
|
||||
starting assumptions. **Proceed to Phase A.1.**
|
||||
|
||||
If something differs, **stop and reconcile** before starting work.
|
||||
|
||||
---
|
||||
|
||||
## Pacing
|
||||
|
||||
This is a ~17-19 hour push, plausibly 2 focused days or 3-4 calendar
|
||||
days with breaks. The work is heavy on prompt engineering (B.3) and
|
||||
data-cleanup (A.6, A.7). Don't rush; the gates are the contract.
|
||||
|
||||
Three explicit user-review checkpoints (A.6.3, B.3', D.2). Wait for
|
||||
sign-off at each before continuing.
|
||||
Reference in New Issue
Block a user