cs249r_book/interviews/vault-cli/scripts/_batching.py at dev

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-07 10:08:50 -05:00

Files

Vijay Janapa Reddi dd71c66cae feat(vault-cli): _judges.py + _batching.py — shared infra for batched audit

Two new helper modules under interviews/vault-cli/scripts/. Used by the
upcoming audit_corpus_batched.py (CORPUS_HARDENING_PLAN.md Phase 3) and
extractable from the existing single-call scripts in a follow-up.

_judges.py exports:
  - GEMINI_MODEL                (pinned)
  - COMMON_MISTAKE_MARKERS      (Pitfall/Rationale/Consequence)
  - NAPKIN_MATH_MARKERS         (Assumptions/Calculations/Conclusion)
  - FAILURE_MODE_TAXONOMY       (4-mode prose block: physical absurdity,
                                 vendor fabrication, mismatch, arithmetic)
  - call_gemini_judge()         (subprocess wrapper + lenient JSON parse)
  - strip_fences()              (response cleanup)
  - gate_format()               (regex format-compliance gate, free)

The taxonomy is the same prose block currently inlined in
validate_drafts.py's COHERENCE_PROMPT and audit_chains_with_gemini.py's
audit prompts. Centralizing it means a future failure-mode addition
flows to every judge, not just one script.

_batching.py exports:
  - MAX_PROMPT_CHARS = 320_000  (≈80K tokens, attention sweet spot)
  - DEFAULT_WRAPPER_CHARS       (4K headroom for prompt scaffolding)
  - pack_batches[T]()           (generic char-budgeted batcher with
                                 optional hard item cap)

Generalized from audit_chains_with_gemini.py:batch_chains and
build_chains_with_gemini.py:plan_batches. Properties documented in the
docstring (preserves order, no items lost, oversized items still land
in a batch).

Followups:
- migrate validate_drafts.py and audit_chains_with_gemini.py to use
  _judges.call_gemini_judge instead of their inlined wrappers (out of
  scope here; non-blocking for the audit work).

CORPUS_HARDENING_PLAN.md Phase 3.

2026-05-03 08:22:39 -04:00

3.7 KiB

Raw Permalink Blame History

View Raw

3.7 KiB Raw Permalink Blame History

3.7 KiB

Raw Permalink Blame History