Two new pieces close the generation→validation→saturation feedback loop:
1. gemini_cli_llm_judge.py — multi-criteria validator. For each draft,
judges math correctness, cell-fit (does it actually target the
declared track/zone/level?), scenario realism, uniqueness vs canonical
questions, and visual-asset alignment. Returns PASS/NEEDS_FIX/DROP
per item. Batched (default 15 per call) for budget efficiency.
2. iterate_coverage_loop.py — drives the full loop:
analyze → plan → generate → render → judge → apply → re-analyze.
Self-paced: stops when (a) top priority gap drops below threshold,
(b) DROP rate exceeds the saturation/hallucination threshold,
(c) total API calls exceed budget, or (d) the same cell is top
priority for two iterations in a row (convergence). The user no
longer specifies "how many questions" — the loop generates until
the corpus reaches a measurable steady state.
Plus 25 round-1 visual questions generated by the new batched generator
(5 batched calls × 5 cells each, zero failures).
The loop is the answer to "we need balance, not just volume": every
iteration's plan derives from a fresh analysis of where coverage is
weakest, so generation can never over-fill an already-saturated cell.