mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-08 02:28:25 -05:00
Adds the deterministic and semantic audit tooling used to drive the release-readiness pass on the YAML question corpus: - audit_yaml_corpus.py — read-only schema + authoring-convention audit - format_yaml_questions.py — canonical formatter (idempotent) - fix_yaml_hygiene.py — bulk hygiene fixups - prepare_semantic_review_queue.py — emit JSONL queues per track for LLM review - semantic_audit_questions.py — parallel LLM audit runner (gpt-5.4-mini) - run_semantic_audit_tracks.py — per-track orchestrator wrapping the runner - build_semantic_fix_queue.py — collect findings into a prioritized fix queue - compare_semantic_passes.py — diff two semantic-audit passes for stability - summarize_semantic_audit.py — markdown summary from findings JSONL Also adds interviews/vault/audit/README.md describing the workflow. Audit output artifacts (semantic-review-queue/, semantic-review-results/, fresh-yaml-audit/) are produced by these scripts on demand and remain untracked.
Vault Audit Workflow
This directory contains release audit reports and archived experiments for the StaffML YAML question corpus.
Active release workflow:
- Run the canonical formatter in check mode:
python3 interviews/vault/scripts/format_yaml_questions.py - Run the deterministic corpus audit:
python3 interviews/vault/scripts/audit_yaml_corpus.py - Apply conservative hygiene fixes when needed:
python3 interviews/vault/scripts/fix_yaml_hygiene.py - Build semantic review batches:
python3 interviews/vault/scripts/prepare_semantic_review_queue.py - Fix all deterministic findings before release.
- Use semantic review for published questions to validate question quality, answer correctness, napkin math, physical plausibility, and level fit.
Recommended semantic review model:
gpt-5.4-minifor the full corpus pass. It is the default insemantic_audit_questions.pyand balances audit quality, latency, and cost.gpt-5.5for selective second opinions on disputed or high-severity findings.
Run a small smoke test before launching the full semantic review:
python3 interviews/vault/scripts/semantic_audit_questions.py \
--limit 2 \
--workers 1 \
--out interviews/vault/audit/semantic-review-results/smoke_semantic_findings.jsonl
Run all published questions in parallel by track:
python3 interviews/vault/scripts/run_semantic_audit_tracks.py --workers-per-track 3 --batch-size 10 --request-timeout 120
Summarize semantic results:
python3 interviews/vault/scripts/summarize_semantic_audit.py
Current active deterministic reports are written to:
fresh-yaml-audit/summary.mdfresh-yaml-audit/issues.jsonlfresh-yaml-audit/stats.jsonl
Semantic review inputs are written to:
semantic-review-queue/published_semantic_queue.jsonlsemantic-review-queue/<track>_published_semantic_queue.jsonlsemantic-review-queue/batches/<track>/*.jsonlsemantic-review-queue/semantic_review_prompt.md
Semantic review outputs are written to:
semantic-review-results/<track>_semantic_findings.jsonlsemantic-review-results/summary.md
Historical experiments belong under archive/ and should not be used as the
release source of truth.