The gemini CLI silently overrides --yolo to default approval mode when
its cwd is not in the trusted-folders list (e.g., a tempfile.gettempdir
scratch dir). The override is logged to stderr as 'Approval mode
overridden to "default" because the current folder is not trusted'
and the call exits 55. --skip-trust opts out of that gate. Verified
2026-05-04 in /tmp/gemini-trust-test.
The gemini CLI in --yolo mode occasionally writes scratch files
(prompt_candidates.json, audit.py, evaluate_*.py, partial JSON outputs)
to its CWD. When invoked from the repo root those landed alongside the
worktree and polluted git status with ~30 untracked files.
Fix: pass cwd=tempfile.gettempdir()/vault_audit_gemini_scratch to
subprocess.run. The scratch dir is created lazily on import.
This doesn't affect Gemini's output (we capture stdout) or the
prompt (we pass via -p). It just keeps the gemini CLI's incidental
file-system side effects out of the worktree.
CORPUS_HARDENING_PLAN.md Phase 3 (delayed reliability fix).
Three bugs surfaced by the global-track canary run (2026-05-03,
20260503T123116Z), all fixed:
1. Gemini-CLI subprocess timeout was 240s; canary's average call took
~167s with 72K-char prompts occasionally exceeding 240s and getting
killed mid-call. 60 questions (2 batches) returned no Gemini
response. Bumped default timeout in _judges.call_gemini_judge()
to 600s (≈3× typical, still triggers fast on a stuck call).
2. Resume logic in run_audit() treated ANY persisted row as "audited,"
including the placeholder rows for batches that errored. That meant
re-running on the same output dir would skip the failed batches
forever. Fixed: only rows with format_compliance != "error" are
added to seen_qids, so a re-run retries the failures.
3. --output passed as a relative path crashed on
`outdir.relative_to(REPO_ROOT)` because relative paths don't share
the absolute REPO_ROOT prefix. Fixed: resolve outdir to absolute
immediately after computing it.
Validation: re-ran the canary on the same output dir with all three
fixes. Resume correctly skipped the 9 good batches, retried the 2
errored batches, and both completed cleanly in 785s. All 313 global
questions now have real Gemini verdicts (0 errors).
Canary findings:
format_compliance: 21 fails, 99.6% Gemini-vs-regex agreement
level_fit: 48 fails (15.3% — the predicted level-inflation
pattern; flagged for Phase 5 review)
coherence: 18 fails
math_correct: 8 fails
title_quality: 16 placeholders (matches regex 1:1)
CORPUS_HARDENING_PLAN.md Phase 4 (canary leg).
Two new helper modules under interviews/vault-cli/scripts/. Used by the
upcoming audit_corpus_batched.py (CORPUS_HARDENING_PLAN.md Phase 3) and
extractable from the existing single-call scripts in a follow-up.
_judges.py exports:
- GEMINI_MODEL (pinned)
- COMMON_MISTAKE_MARKERS (Pitfall/Rationale/Consequence)
- NAPKIN_MATH_MARKERS (Assumptions/Calculations/Conclusion)
- FAILURE_MODE_TAXONOMY (4-mode prose block: physical absurdity,
vendor fabrication, mismatch, arithmetic)
- call_gemini_judge() (subprocess wrapper + lenient JSON parse)
- strip_fences() (response cleanup)
- gate_format() (regex format-compliance gate, free)
The taxonomy is the same prose block currently inlined in
validate_drafts.py's COHERENCE_PROMPT and audit_chains_with_gemini.py's
audit prompts. Centralizing it means a future failure-mode addition
flows to every judge, not just one script.
_batching.py exports:
- MAX_PROMPT_CHARS = 320_000 (≈80K tokens, attention sweet spot)
- DEFAULT_WRAPPER_CHARS (4K headroom for prompt scaffolding)
- pack_batches[T]() (generic char-budgeted batcher with
optional hard item cap)
Generalized from audit_chains_with_gemini.py:batch_chains and
build_chains_with_gemini.py:plan_batches. Properties documented in the
docstring (preserves order, no items lost, oversized items still land
in a batch).
Followups:
- migrate validate_drafts.py and audit_chains_with_gemini.py to use
_judges.call_gemini_judge instead of their inlined wrappers (out of
scope here; non-blocking for the audit work).
CORPUS_HARDENING_PLAN.md Phase 3.