mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 10:08:50 -05:00
Vault Audit Workflow
This directory contains release audit reports and archived experiments for the StaffML YAML question corpus.
Active release workflow:
- Run the canonical formatter in check mode:
python3 interviews/vault/scripts/format_yaml_questions.py - Run the deterministic corpus audit:
python3 interviews/vault/scripts/audit_yaml_corpus.py - Apply conservative hygiene fixes when needed:
python3 interviews/vault/scripts/fix_yaml_hygiene.py - Build semantic review batches:
python3 interviews/vault/scripts/prepare_semantic_review_queue.py - Fix all deterministic findings before release.
- Use semantic review for published questions to validate question quality, answer correctness, napkin math, physical plausibility, and level fit.
Recommended semantic review model:
gpt-5.4-minifor the full corpus pass. It is the default insemantic_audit_questions.pyand balances audit quality, latency, and cost.gpt-5.5for selective second opinions on disputed or high-severity findings.
Run a small smoke test before launching the full semantic review:
python3 interviews/vault/scripts/semantic_audit_questions.py \
--limit 2 \
--workers 1 \
--out interviews/vault/audit/semantic-review-results/smoke_semantic_findings.jsonl
Run all published questions in parallel by track:
python3 interviews/vault/scripts/run_semantic_audit_tracks.py --workers-per-track 3 --batch-size 10 --request-timeout 120
Summarize semantic results:
python3 interviews/vault/scripts/summarize_semantic_audit.py
Current active deterministic reports are written to:
fresh-yaml-audit/summary.mdfresh-yaml-audit/issues.jsonlfresh-yaml-audit/stats.jsonl
Semantic review inputs are written to:
semantic-review-queue/published_semantic_queue.jsonlsemantic-review-queue/<track>_published_semantic_queue.jsonlsemantic-review-queue/batches/<track>/*.jsonlsemantic-review-queue/semantic_review_prompt.md
Semantic review outputs are written to:
semantic-review-results/<track>_semantic_findings.jsonlsemantic-review-results/summary.md
Historical experiments belong under archive/ and should not be used as the
release source of truth.