mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-08 02:28:25 -05:00

Files

Vijay Janapa Reddi 90b2abd178 feat(vault): add semantic-audit pipeline for question corpus QA

Adds the deterministic and semantic audit tooling used to drive the
release-readiness pass on the YAML question corpus:

- audit_yaml_corpus.py        — read-only schema + authoring-convention audit
- format_yaml_questions.py    — canonical formatter (idempotent)
- fix_yaml_hygiene.py         — bulk hygiene fixups
- prepare_semantic_review_queue.py — emit JSONL queues per track for LLM review
- semantic_audit_questions.py — parallel LLM audit runner (gpt-5.4-mini)
- run_semantic_audit_tracks.py — per-track orchestrator wrapping the runner
- build_semantic_fix_queue.py — collect findings into a prioritized fix queue
- compare_semantic_passes.py  — diff two semantic-audit passes for stability
- summarize_semantic_audit.py — markdown summary from findings JSONL

Also adds interviews/vault/audit/README.md describing the workflow.

Audit output artifacts (semantic-review-queue/, semantic-review-results/,
fresh-yaml-audit/) are produced by these scripts on demand and remain
untracked.

2026-05-05 09:08:56 -04:00

phase2

…

2026-04-25-schema-folder-audit.md

…

README.md

feat(vault): add semantic-audit pipeline for question corpus QA

2026-05-05 09:08:56 -04:00

README.md

Vault Audit Workflow

This directory contains release audit reports and archived experiments for the StaffML YAML question corpus.

Active release workflow:

Run the canonical formatter in check mode: python3 interviews/vault/scripts/format_yaml_questions.py
Run the deterministic corpus audit: python3 interviews/vault/scripts/audit_yaml_corpus.py
Apply conservative hygiene fixes when needed: python3 interviews/vault/scripts/fix_yaml_hygiene.py
Build semantic review batches: python3 interviews/vault/scripts/prepare_semantic_review_queue.py
Fix all deterministic findings before release.
Use semantic review for published questions to validate question quality, answer correctness, napkin math, physical plausibility, and level fit.

Recommended semantic review model:

gpt-5.4-mini for the full corpus pass. It is the default in semantic_audit_questions.py and balances audit quality, latency, and cost.
gpt-5.5 for selective second opinions on disputed or high-severity findings.

Run a small smoke test before launching the full semantic review:

python3 interviews/vault/scripts/semantic_audit_questions.py \
  --limit 2 \
  --workers 1 \
  --out interviews/vault/audit/semantic-review-results/smoke_semantic_findings.jsonl

Run all published questions in parallel by track:

python3 interviews/vault/scripts/run_semantic_audit_tracks.py --workers-per-track 3 --batch-size 10 --request-timeout 120

Summarize semantic results:

python3 interviews/vault/scripts/summarize_semantic_audit.py

Current active deterministic reports are written to:

fresh-yaml-audit/summary.md
fresh-yaml-audit/issues.jsonl
fresh-yaml-audit/stats.jsonl

Semantic review inputs are written to:

semantic-review-queue/published_semantic_queue.jsonl
semantic-review-queue/<track>_published_semantic_queue.jsonl
semantic-review-queue/batches/<track>/*.jsonl
semantic-review-queue/semantic_review_prompt.md

Semantic review outputs are written to:

semantic-review-results/<track>_semantic_findings.jsonl
semantic-review-results/summary.md

Historical experiments belong under archive/ and should not be used as the release source of truth.