Files
Vijay Janapa Reddi 0afc384282 feat(vault): LLM-as-judge validator + iterative coverage loop
Two new pieces close the generation→validation→saturation feedback loop:

1. gemini_cli_llm_judge.py — multi-criteria validator. For each draft,
   judges math correctness, cell-fit (does it actually target the
   declared track/zone/level?), scenario realism, uniqueness vs canonical
   questions, and visual-asset alignment. Returns PASS/NEEDS_FIX/DROP
   per item. Batched (default 15 per call) for budget efficiency.

2. iterate_coverage_loop.py — drives the full loop:
   analyze → plan → generate → render → judge → apply → re-analyze.
   Self-paced: stops when (a) top priority gap drops below threshold,
   (b) DROP rate exceeds the saturation/hallucination threshold,
   (c) total API calls exceed budget, or (d) the same cell is top
   priority for two iterations in a row (convergence). The user no
   longer specifies "how many questions" — the loop generates until
   the corpus reaches a measurable steady state.

Plus 25 round-1 visual questions generated by the new batched generator
(5 batched calls × 5 cells each, zero failures).

The loop is the answer to "we need balance, not just volume": every
iteration's plan derives from a fresh analysis of where coverage is
weakest, so generation can never over-fill an already-saturated cell.
2026-04-25 09:18:32 -04:00

28 lines
1.1 KiB
Python

import os
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.patches import Patch
time = np.arange(0, 60)
state = []
for t in time:
if 0 <= t < 10: state.append(1)
elif 10 <= t < 15: state.append(2)
elif 15 <= t < 25: state.append(1)
elif 25 <= t < 30: state.append(2)
elif 30 <= t < 35: state.append(1)
elif t == 35: state.append(0)
elif 35 < t <= 45: state.append(3)
else: state.append(2)
colors = {0: 'red', 1: '#4a90c4', 2: '#3d9e5a', 3: '#c87b2a'}
fig, ax = plt.subplots(figsize=(10, 2))
for t, s in enumerate(state):
ax.barh(0, 1, left=t, color=colors[s], edgecolor='none')
ax.set_yticks([])
ax.set_xlabel('Time (Minutes)')
ax.set_title('Synchronous Checkpointing Overload (C=10, M=11.25)')
legend_elements = [Patch(facecolor='#3d9e5a', label='Compute'), Patch(facecolor='#4a90c4', label='Checkpoint'), Patch(facecolor='red', label='Crash'), Patch(facecolor='#c87b2a', label='Recovery')]
ax.legend(handles=legend_elements, loc='upper right', bbox_to_anchor=(1.15, 1))
plt.tight_layout()
plt.savefig(os.environ.get('VISUAL_OUT_PATH', 'out.svg'), format='svg', bbox_inches='tight')