mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 10:08:50 -05:00
Two new helper modules under interviews/vault-cli/scripts/. Used by the
upcoming audit_corpus_batched.py (CORPUS_HARDENING_PLAN.md Phase 3) and
extractable from the existing single-call scripts in a follow-up.
_judges.py exports:
- GEMINI_MODEL (pinned)
- COMMON_MISTAKE_MARKERS (Pitfall/Rationale/Consequence)
- NAPKIN_MATH_MARKERS (Assumptions/Calculations/Conclusion)
- FAILURE_MODE_TAXONOMY (4-mode prose block: physical absurdity,
vendor fabrication, mismatch, arithmetic)
- call_gemini_judge() (subprocess wrapper + lenient JSON parse)
- strip_fences() (response cleanup)
- gate_format() (regex format-compliance gate, free)
The taxonomy is the same prose block currently inlined in
validate_drafts.py's COHERENCE_PROMPT and audit_chains_with_gemini.py's
audit prompts. Centralizing it means a future failure-mode addition
flows to every judge, not just one script.
_batching.py exports:
- MAX_PROMPT_CHARS = 320_000 (≈80K tokens, attention sweet spot)
- DEFAULT_WRAPPER_CHARS (4K headroom for prompt scaffolding)
- pack_batches[T]() (generic char-budgeted batcher with
optional hard item cap)
Generalized from audit_chains_with_gemini.py:batch_chains and
build_chains_with_gemini.py:plan_batches. Properties documented in the
docstring (preserves order, no items lost, oversized items still land
in a batch).
Followups:
- migrate validate_drafts.py and audit_chains_with_gemini.py to use
_judges.call_gemini_judge instead of their inlined wrappers (out of
scope here; non-blocking for the audit work).
CORPUS_HARDENING_PLAN.md Phase 3.
3.7 KiB
3.7 KiB