mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-08 09:57:21 -05:00
Closes the cleanup arc (A.1–A.10 in RESUME_PLAN_RELEASE.md). Every
gate is now green: vault check --strict, vault lint, vault doctor,
vault codegen --check, staffml validate-vault, Playwright (9/9), tsc.
A.1 mobile-1962.svg: renamed `Edge` → `RegEdge` in graphviz source
(`Edge` is a reserved keyword); SVG renders cleanly. Also fixed
tinyml-1570.py (missing `import numpy as np`) which the new failure
log surfaced.
A.2 render_visuals.py: structured per-ID failure log written to
`_validation_results/render_failures.json` on every run; non-zero
exit on any per-item crash; new `--fail-fast` and `--failure-log`
CLI options. Replaces the prior silent-failure mode.
A.3 LinkML visual schema: typed as a structured sub-schema. New
`VisualKind` enum (svg only — `mermaid` was reserved but never
shipped, dropped to keep the enum honest). Path regex tightened
to `^[a-z0-9-]+\.svg$`. Alt minimum length 10, caption required
minimum length 5. TypeScript Visual interface + Question.visual
field added to staffml-vault-types/index.ts.
A.4 Pydantic Visual + Question validators:
- Visual.kind hard-rejects anything but `svg`
- Visual.path enforces the new regex
- Visual.alt min 10 chars, caption required min 5 chars
- Question.model_validator: visual.path MUST resolve to a real
file under interviews/vault/visuals/<track>/. Skipped in
production deploys where the working tree is absent.
A.5 Registry repair + doctor split:
- tools: repair_registry.py appended 5,269 missing IDs
(the rename refactor at 8a5c3ff3c left the append-only registry
unsynced; this brings disk-coverage to 100%). Header block in
id-registry.yaml documents the rebuild rationale.
- doctor.py: split symmetric `registry-integrity` check into
`disk-coverage` (HARD FAIL if any disk YAML id is unregistered)
and `registry-history` (INFO ONLY for retired ids — the registry
is by design an audit log, retired ids are normal). Pre-existing
`_check_schema_version` bug (`versions == {1}` vs string `"1.0"`)
fixed.
A.6 Lint calibration via 4-expert consensus + bloom-canonical
reclassification:
- Spawned 4 experts (Vijay Reddi, Chip Huyen, Jeff Dean,
education-reviewer) on 42 disputed (zone, level) pairs;
consensus-builder aggregated to 15 valid / 19 invalid / 8
borderline.
- User arbitrated 8 borderlines: 7 widen / 1 reclassify.
- Built ZONE_BLOOM_AFFINITY matrix (Education-Reviewer's idea):
every zone admits its dominant Bloom verb + adjacent verbs,
rejects clear hierarchy violations.
- reclassify_zone_bloom_mismatch.py applied 576 deterministic
zone fixes via BLOOM_CANONICAL_ZONE mapping (e.g. fluency+analyze
→ analyze, recall+analyze → analyze, evaluation+apply → implement).
- Question.model_validator(_zone_bloom_compatible): hard-rejects
future zone-bloom mismatches at write time. Generated drafts
can no longer ship a self-contradicting classification.
- ZONE_LEVEL_AFFINITY widened per consensus + arbitration +
post-reclassification adjustments. Lint warnings: 1,308 → 0.
A.7 Chain integrity:
- repair_chains.py: drops chain refs when a chain has <2 published
members (chain ceases to exist), renumbers all members of any
chain whose positions are non-sequential / duplicated /
non-monotonic-by-level. Sort key: level ascending, then old
position, then qid (deterministic).
- validate-vault.py: relaxed sequential check to unique-positions
check. Position gaps from mid-chain deletions are normal; what
matters is uniqueness + bloom-monotonicity (vault check --strict
enforces both from YAML source-of-truth).
A.8 Practice page visual + zoom modal:
- QuestionVisual.tsx: wraps the `<img>` in `<Zoom>` from
react-medium-image-zoom (4 KB). Click image → fullscreen
`<dialog data-rmiz-modal>`; ESC closes. Added test-id
`question-visual-img` for stable selector.
- New Playwright test: 9th in the suite, deep-links cloud-4492,
asserts the dialog opens on click and closes on ESC.
- TypeScript: removed `mermaid` from local Visual types in
corpus.ts and corpus-vault.ts; tsc clean.
A.9 All gates green:
- vault check --strict: 0 errors / 0 invariant failures
- vault lint: 0 errors / 0 warnings (was 1,308 warnings)
- vault codegen --check: artifacts in sync (hash baseline updated)
- vault doctor: 0 fails (registry-history info, git-state warn
on uncommitted state-pre-this-commit)
- staffml validate-vault: 0 errors / 0 warnings, deployment-ready
- Playwright: 9/9 pass (was 8; +zoom modal test)
- render_visuals: 0 errors (was 2 silent failures pre-A.2)
- tsc: clean
Distribution after reclassification: 9,544 published unchanged;
576 items moved zone via bloom-canonical mapping (full per-item
report at /tmp/reclassify_changes.csv). Chain count 879 → 850
after orphan-singleton drops. release_hash updated.
Carry-forward to next session (Phase B):
- Priority gap closure for parallelism cells + global L4-L6+
(the run that produced this corpus did not close the targeted
cells; B.3 needs specialized prompts per cell-class)
- 120 NEEDS_FIX items from coverage_loop/20260425_150712/ still
carry judge fix_suggestions; spawn fix-agent in Phase C
276 lines
10 KiB
Python
276 lines
10 KiB
Python
#!/usr/bin/env python3
|
|
"""Render question visuals to ship-ready SVG.
|
|
|
|
The schema the website cares about is minimal::
|
|
|
|
visual:
|
|
kind: svg # always svg — that's what the website ships
|
|
path: <id>.svg # the static asset
|
|
alt: <text>
|
|
caption: <text>
|
|
# Build metadata (optional, ignored by website):
|
|
source_format: dot | matplotlib | hand # default: hand
|
|
|
|
The runtime story for the practice page is "load `<id>.svg` as an
|
|
`<img>`". The renderer's job is to produce that SVG when it's a build
|
|
artifact (DOT or matplotlib source). For hand-authored SVGs, the source
|
|
IS the asset, no build needed.
|
|
|
|
Source files live next to the asset by naming convention:
|
|
|
|
interviews/vault/visuals/<track>/<id>.svg # the asset (always)
|
|
interviews/vault/visuals/<track>/<id>.dot # iff source_format=dot
|
|
interviews/vault/visuals/<track>/<id>.py # iff source_format=matplotlib
|
|
|
|
Usage:
|
|
python3 render_visuals.py # render all stale
|
|
python3 render_visuals.py --force # force re-render
|
|
python3 render_visuals.py --id cloud-2846 # single question
|
|
python3 render_visuals.py --dry-run # plan only
|
|
|
|
Architecture: see interviews/vault/visuals/ARCHITECTURE.md.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import argparse
|
|
import json
|
|
import re
|
|
import subprocess
|
|
import sys
|
|
from datetime import datetime, timezone
|
|
from pathlib import Path
|
|
from typing import Any
|
|
|
|
import yaml
|
|
|
|
VAULT_DIR = Path(__file__).resolve().parent.parent
|
|
QUESTIONS_DIR = VAULT_DIR / "questions"
|
|
VISUALS_DIR = VAULT_DIR / "visuals"
|
|
DEFAULT_FAILURE_LOG = VAULT_DIR / "_validation_results" / "render_failures.json"
|
|
|
|
VALID_KINDS = {"svg"} # what the website renders
|
|
# Source format is inferred from the filesystem: a `<basename>.dot` next
|
|
# to `<basename>.svg` means DOT-built; `.py` means matplotlib-built; no
|
|
# sibling means hand-authored. We do not encode this in the YAML schema.
|
|
SOURCE_EXT_TO_FORMAT = {".dot": "dot", ".py": "matplotlib"}
|
|
|
|
# Book SVG style: enforce these properties on every rendered SVG so DOT,
|
|
# matplotlib, and hand-SVG outputs render identically in the practice page.
|
|
SVG_FONT_FAMILY = "Helvetica Neue, Helvetica, Arial, sans-serif"
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Discovery
|
|
# ---------------------------------------------------------------------------
|
|
|
|
def discover_visuals() -> list[dict[str, Any]]:
|
|
"""Return one record per question with a `visual:` block.
|
|
|
|
Reads only the production-schema fields: kind, path, alt, caption.
|
|
Build metadata: source_format (optional).
|
|
"""
|
|
records = []
|
|
for yaml_path in QUESTIONS_DIR.glob("**/*.yaml"):
|
|
try:
|
|
data = yaml.safe_load(yaml_path.read_text(encoding="utf-8"))
|
|
except Exception as exc:
|
|
print(f" ! {yaml_path}: parse error {exc}", file=sys.stderr)
|
|
continue
|
|
if not data or "visual" not in data:
|
|
continue
|
|
v = data["visual"]
|
|
if not isinstance(v, dict):
|
|
continue
|
|
|
|
kind = v.get("kind", "svg")
|
|
if kind not in VALID_KINDS:
|
|
print(f" ! {data.get('id')}: unsupported kind={kind!r} "
|
|
f"(only {VALID_KINDS} ship)", file=sys.stderr)
|
|
continue
|
|
|
|
path = v.get("path")
|
|
if not path:
|
|
print(f" ! {data.get('id')}: missing visual.path", file=sys.stderr)
|
|
continue
|
|
|
|
track = data.get("track", "global")
|
|
track_dir = VISUALS_DIR / track
|
|
asset_path = track_dir / path
|
|
|
|
# Infer source format by filesystem: look for a sibling file with
|
|
# a known build-source extension. Absent => hand-authored.
|
|
basename = asset_path.with_suffix("")
|
|
source_path = None
|
|
source_format = "hand"
|
|
for ext, fmt in SOURCE_EXT_TO_FORMAT.items():
|
|
cand = basename.with_suffix(ext)
|
|
if cand.exists():
|
|
source_path = cand
|
|
source_format = fmt
|
|
break
|
|
|
|
records.append({
|
|
"id": data["id"],
|
|
"track": track,
|
|
"asset_path": asset_path,
|
|
"source_path": source_path,
|
|
"source_format": source_format,
|
|
"yaml_path": yaml_path,
|
|
})
|
|
return records
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Rendering
|
|
# ---------------------------------------------------------------------------
|
|
|
|
def render_one(rec: dict[str, Any], force: bool = False, dry_run: bool = False) -> str:
|
|
"""Render or pass through a single visual. Returns 'rendered'|'skipped'|error."""
|
|
qid = rec["id"]
|
|
fmt = rec["source_format"]
|
|
src = rec["source_path"]
|
|
out = rec["asset_path"]
|
|
|
|
if fmt == "hand":
|
|
if not out.exists():
|
|
return f"error:{qid}:hand-authored asset missing at {out}"
|
|
return "skipped"
|
|
|
|
# Build artifact path: needs source
|
|
if not src or not src.exists():
|
|
return f"error:{qid}:{fmt} source missing at {src}"
|
|
if not force and out.exists() and out.stat().st_mtime >= src.stat().st_mtime:
|
|
return "skipped"
|
|
if dry_run:
|
|
print(f" + would render {qid}: {src.name} -> {out.name}")
|
|
return "rendered"
|
|
out.parent.mkdir(parents=True, exist_ok=True)
|
|
if fmt == "dot":
|
|
_render_dot(src, out)
|
|
else:
|
|
_render_matplotlib(src, out)
|
|
_normalize_svg(out)
|
|
return "rendered"
|
|
|
|
|
|
def _render_dot(src: Path, out: Path) -> None:
|
|
result = subprocess.run(
|
|
["dot", "-Tsvg", str(src), "-o", str(out)],
|
|
capture_output=True, text=True, timeout=30,
|
|
)
|
|
if result.returncode != 0:
|
|
raise RuntimeError(f"dot failed: {result.stderr.strip()}")
|
|
|
|
|
|
def _render_matplotlib(src: Path, out: Path) -> None:
|
|
"""Execute the source script with VISUAL_OUT_PATH env var."""
|
|
import os
|
|
env = dict(os.environ)
|
|
env["VISUAL_OUT_PATH"] = str(out)
|
|
result = subprocess.run(
|
|
["python3", str(src)],
|
|
capture_output=True, text=True, timeout=60, env=env,
|
|
)
|
|
if result.returncode != 0:
|
|
raise RuntimeError(f"matplotlib script failed: {result.stderr.strip()}")
|
|
if not out.exists():
|
|
raise RuntimeError(
|
|
f"matplotlib script ran but did not write to {out}; "
|
|
"did the script use os.environ['VISUAL_OUT_PATH']?"
|
|
)
|
|
|
|
|
|
def _normalize_svg(path: Path) -> None:
|
|
"""Apply book-style normalization to a rendered SVG."""
|
|
text = path.read_text(encoding="utf-8")
|
|
if "data:image/" in text or "<image " in text:
|
|
raise RuntimeError(f"{path} contains embedded raster — not allowed")
|
|
text = re.sub(r"<!--\s*Generated by [^>]*?-->\s*", "", text)
|
|
if "font-family=" not in text.split(">", 1)[0]:
|
|
text = re.sub(
|
|
r"<svg(\s[^>]*?)>",
|
|
lambda m: f'<svg{m.group(1)} font-family="{SVG_FONT_FAMILY}">',
|
|
text, count=1,
|
|
)
|
|
path.write_text(text, encoding="utf-8")
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# CLI
|
|
# ---------------------------------------------------------------------------
|
|
|
|
def main() -> int:
|
|
parser = argparse.ArgumentParser(description=__doc__)
|
|
parser.add_argument("--force", action="store_true",
|
|
help="Re-render even if output is fresh.")
|
|
parser.add_argument("--dry-run", action="store_true")
|
|
parser.add_argument("--id", help="Render only this question id.")
|
|
parser.add_argument("--fail-fast", action="store_true",
|
|
help="Abort on first per-item failure instead of "
|
|
"continuing through the batch.")
|
|
parser.add_argument("--failure-log", type=Path, default=DEFAULT_FAILURE_LOG,
|
|
help=f"Path for structured per-ID failure log "
|
|
f"(default: {DEFAULT_FAILURE_LOG}). Always "
|
|
"written on completion; empty array if no "
|
|
"failures.")
|
|
args = parser.parse_args()
|
|
|
|
recs = discover_visuals()
|
|
if args.id:
|
|
recs = [r for r in recs if r["id"] == args.id]
|
|
if not recs:
|
|
print(f"No visual found for id={args.id}", file=sys.stderr)
|
|
return 1
|
|
|
|
print(f"Discovered {len(recs)} visual(s).")
|
|
counts = {"rendered": 0, "skipped": 0, "error": 0}
|
|
failures: list[dict[str, Any]] = []
|
|
for rec in recs:
|
|
try:
|
|
status = render_one(rec, force=args.force, dry_run=args.dry_run)
|
|
except Exception as exc:
|
|
status = f"error:{rec['id']}:{exc}"
|
|
if status.startswith("error"):
|
|
print(f" ✗ {status}")
|
|
counts["error"] += 1
|
|
# Parse "error:<qid>:<message>" into structured record
|
|
parts = status.split(":", 2)
|
|
qid = parts[1] if len(parts) > 1 else rec["id"]
|
|
err_msg = parts[2] if len(parts) > 2 else status
|
|
failures.append({
|
|
"id": qid,
|
|
"track": rec["track"],
|
|
"source_format": rec["source_format"],
|
|
"source_path": str(rec["source_path"]) if rec["source_path"] else None,
|
|
"asset_path": str(rec["asset_path"]),
|
|
"error": err_msg.strip(),
|
|
})
|
|
if args.fail_fast:
|
|
print(" ! --fail-fast set; aborting batch", file=sys.stderr)
|
|
break
|
|
else:
|
|
print(f" {'✓' if status == 'rendered' else '·'} {rec['id']:30s} "
|
|
f"[{rec['source_format']:11s}] {status}")
|
|
counts[status] += 1
|
|
|
|
# Always write the failure log so downstream consumers (CI, judge step,
|
|
# release_gate.sh) can read a stable path. Empty array if no failures.
|
|
args.failure_log.parent.mkdir(parents=True, exist_ok=True)
|
|
args.failure_log.write_text(json.dumps({
|
|
"generated_at": datetime.now(timezone.utc).isoformat(),
|
|
"total_visuals": len(recs),
|
|
"rendered": counts["rendered"],
|
|
"skipped": counts["skipped"],
|
|
"errors": counts["error"],
|
|
"failures": failures,
|
|
}, indent=2), encoding="utf-8")
|
|
print(f"\nFailure log: {args.failure_log}")
|
|
print(f"Summary: rendered={counts['rendered']} "
|
|
f"skipped={counts['skipped']} errors={counts['error']}")
|
|
return 1 if counts["error"] else 0
|
|
|
|
|
|
if __name__ == "__main__":
|
|
sys.exit(main())
|