mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-08 02:28:25 -05:00
Final brute-force release-readiness pass: every gate green, 0.1.3
released and verified, every observable failure mode closed at source.
═══ AUDITS (G.A–G.D) ═══
G.A — gemini-3.1-pro-preview default everywhere. Active CLI scripts
already used it; bulk-patched 6 legacy scripts (`generate_batch.py`,
`validate_questions.py`, `generate_gaps.py`, `run_reviews.sh`,
`generate.py`, `review_math.sh`) + WORKFLOW.md off `gemini-2.5-flash`
or `gemini-2.5-pro` to `gemini-3.1-pro-preview`. Only `archive/`
references remain (intentionally legacy).
G.B — Cloudflare workflow audit. `vault verify 0.1.1` correctly
failed (YAMLs evolved since 0.1.1 cut). Confirmed `vault publish`,
`vault deploy`, `vault ship`, `vault rollback`, `vault verify`,
`vault snapshot`, `vault tag` all wired. Released 0.1.2 then 0.1.3
to lock final state.
G.C — Visual asset integrity audit. 236/236 YAML visual references
resolve, 0 orphan SVGs, 0 missing files, 0 unrendered sources.
Clean.
G.D — Unit tests for new validators added at `tests/test_models.py`:
15 tests covering Visual.kind enum, Visual.path regex, Visual.alt
+ caption min lengths + required, Question._zone_bloom_compatible
(recall+remember accepted, recall+evaluate rejected, mastery+
remember rejected, evaluation+evaluate accepted, design+create
accepted), Question._visual_path_resolves. **15/15 pass.**
═══ CONTENT CLEANUP (G.E–G.L) ═══
G.E — Sample re-judge of 100 random cloud parallelism items via
Gemini 3.1 Pro Preview (4 API calls): 53% PASS / 23% NEEDS_FIX /
24% DROP. Surfaced legacy quality drift — items generated under
pre-Phase-D laxer prompts were not meeting the new strict bar
(math errors with bidirectional vs unidirectional NVLink,
"Based on the diagram..." references with no diagram, deprecated
practices like SSP for modern LLM training, wrong-track scenarios
like Cortex-M4 in cloud track).
G.H — General-purpose cleanup agent on 47 flagged items:
**31 rewritten** with PARALLELISM_RULES bar applied (concrete
unidirectional NVLink 450 GB/s, IB NDR 25 GB/s, RoCE v2 22 GB/s,
PCIe Gen3 12 GB/s; multi-step ring AllReduce arguments with the
2(N-1)/N factor; non-obvious failure modes); **16 archived** with
documented `deletion_reason` (mathematically broken premises,
physics errors, topic-irreconcilable, direct duplicates).
G.L — Re-judge of 31 G.H rewrites: **23 PASS / 3 NEEDS_FIX / 5 DROP =
74.2% pass rate**. The 8 still-failing items archived (after the
cleanup pass still couldn't satisfy the strict bar). Contract:
items get THREE chances — original generation, fix-agent, retry-
fix — and if they still fail, archived not promoted. Honest.
═══ STUBBORN-FAIL ARCHIVES (Phase F residuals) ═══
After three independent fix-agent passes (Phase C, F.2, F.4), 4 items
remained NEEDS_FIX or DROP: edge-2390, edge-2401, mobile-1948,
tinyml-1681. Archived with `deletion_reason` documenting the 3-attempt
failure history. The cell may be structurally awkward; preserving
items for audit but removing from the bundle.
═══ ORPHAN CHAIN FIX ═══
After archives, `cloud-chain-359` had only 1 published member
(`cloud-1840`); its sibling `cloud-1845` got archived. Dropped the
chain ref from cloud-1840 + ran `repair_chains.py` to clean residual
references in archived YAMLs. `vault check --strict` now passes 0
chain warnings.
═══ E.2 / E.3 SHIPPED EARLIER IN PRIOR COMMIT ═══
(Documented in commit `20ea20005` for completeness):
- `vault build --legacy-json` auto-emits `vault-manifest.json`.
- `analyze_coverage_gaps.py --include-areas <areas>` flag.
═══ 0.1.3 FINAL RELEASE ═══
`vault publish 0.1.3` snapshot at `releases/0.1.3/`. Migrations:
+0 ~27 -28 (zero net new questions, 27 modified during cleanup, 28
archived/promoted). `vault verify 0.1.3` ✓ — release_hash
`793c06f414f2bf8391a8a5c56ec0ff8d76bfce4ab7c64ad12ecb83f6d932280e`
reconstructs from YAML. Latest symlink → 0.1.3.
═══ FINAL ALL-9-GATES SWEEP — ALL GREEN ═══
[1] vault check --strict ✓ 10,701 / 0 errors / 0 invariants
[2] vault lint ✓ 0 errors / 0 warnings / 9,757 info
[3] vault doctor ✓ 0 fails (registry-history info OK)
[4] vault codegen --check ✓ artifacts in sync
[5] vault verify 0.1.3 ✓ hash reconstructs from YAML
[6] staffml validate-vault ✓ 0 errors / 0 warnings, deployment-ready
[7] render_visuals ✓ 236 visuals, 0 errors
[8] tsc ✓ TypeScript clean
[9] Playwright ✓ 9/9 pass
═══ FINAL CORPUS STATE ═══
Bundle: 9,757 published (was 9,224 at branch cut, **+533 net** across
the full multi-session push, after all archives).
Total commits on branch since cut: 10.
Release tag latest: 0.1.3 (verified-clean).
Status: StaffML-day-ready. Ship it.
152 lines
5.6 KiB
Python
152 lines
5.6 KiB
Python
#!/usr/bin/env python3
|
||
"""Fast parallel batch generator — fills ALL topic×track×zone gaps.
|
||
|
||
Reads pre-computed jobs from /tmp/staffml_jobs.json and spawns
|
||
gemini CLI calls in parallel, writing results to a temp file.
|
||
|
||
Usage:
|
||
python3 generate_batch.py --workers 40 --output /tmp/batch_all.json
|
||
"""
|
||
|
||
import argparse
|
||
import json
|
||
import re
|
||
import subprocess
|
||
import sys
|
||
import time
|
||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||
from pathlib import Path
|
||
|
||
PLATFORMS = {
|
||
"cloud": [
|
||
("NVIDIA H100", "80 GB HBM3, 495 TFLOPS FP16 dense, 3.35 TB/s, 700W"),
|
||
("Google TPU v5e", "16 GB HBM, 197 TFLOPS BF16, 1.6 TB/s"),
|
||
("AMD MI300X", "192 GB HBM3, 1300 TFLOPS FP16 sparse, 5.3 TB/s"),
|
||
("NVIDIA A100", "80 GB HBM2e, 312 TFLOPS FP16, 2.0 TB/s, 400W"),
|
||
],
|
||
"edge": [
|
||
("NVIDIA Jetson Orin", "32 GB LPDDR5, 275 TOPS INT8, 60W"),
|
||
("Hailo-8", "26 TOPS INT8, 2.5W dataflow accelerator"),
|
||
("Google Coral Edge TPU", "4 TOPS INT8, 2W, INT8 only"),
|
||
("Qualcomm Cloud AI 100", "32 GB LPDDR4x, 400 TOPS INT8, 75W"),
|
||
],
|
||
"mobile": [
|
||
("Apple A17 Pro", "35 TOPS Neural Engine, ~5W, 8 GB unified"),
|
||
("Snapdragon 8 Gen 3 Hexagon NPU", "45 TOPS INT8, 12-16 GB LPDDR5X"),
|
||
("Google Tensor G3", "7.5 TOPS on-device TPU, 12 GB LPDDR5X"),
|
||
("Samsung Exynos 2400 NPU", "34.7 TOPS, 12 GB LPDDR5X"),
|
||
],
|
||
"tinyml": [
|
||
("ARM Cortex-M4 STM32F4", "168 MHz, 256 KB SRAM, no FPU"),
|
||
("ESP32-S3", "240 MHz dual-core, 512 KB SRAM, 8 MB PSRAM"),
|
||
("Cortex-M7 + Ethos-U55", "480 MHz, 512 KB SRAM, 128-512 MAC/cycle NPU"),
|
||
("Nordic nRF5340", "128 MHz, 256 KB SRAM, 1 MB flash, BLE"),
|
||
],
|
||
"global": [
|
||
("Generic", "Cross-platform principles"),
|
||
],
|
||
}
|
||
|
||
ZONE_HINT = {
|
||
"recall": "retrieve a fact or spec",
|
||
"analyze": "explain WHY a system behaves this way (provide specs)",
|
||
"design": "architect a system from requirements",
|
||
"implement": "produce a number or formula",
|
||
"diagnosis": "identify root cause from symptoms",
|
||
"specification": "design a system meeting quantitative constraints",
|
||
"fluency": "do napkin math from memory",
|
||
"evaluation": "compare two architectures (provide specs)",
|
||
"realization": "size a chosen architecture concretely",
|
||
"optimization": "diagnose bottleneck AND quantify the fix",
|
||
"mastery": "recall specs + analyze + design + size (all four skills)",
|
||
}
|
||
|
||
LEVEL_MAP = {
|
||
"recall": "L2", "analyze": "L4", "design": "L5", "implement": "L3",
|
||
"diagnosis": "L4", "specification": "L5", "fluency": "L3",
|
||
"evaluation": "L5", "realization": "L6+", "optimization": "L4", "mastery": "L6+",
|
||
}
|
||
|
||
MODEL = "gemini-3.1-pro-preview"
|
||
|
||
|
||
def gen_one(idx, job):
|
||
plats = PLATFORMS.get(job["track"], PLATFORMS["global"])
|
||
pname, pspecs = plats[idx % len(plats)]
|
||
level = LEVEL_MAP.get(job["zone"], "L4")
|
||
|
||
prompt = f"""Generate 1 ML systems interview question. Output ONLY valid JSON.
|
||
Topic: {job['name']} - {job['desc']}
|
||
Zone: {job['zone']} ({ZONE_HINT[job['zone']]})
|
||
Track: {job['track']} | Level: {level} | Area: {job['area']}
|
||
Platform: {pname} ({pspecs})
|
||
The scenario MUST reference {pname} with real specs.
|
||
JSON format: {{"title":"...","track":"{job['track']}","level":"{level}","topic":"{job['topic']}","zone":"{job['zone']}","competency_area":"{job['area']}","bloom_level":"analyze","scenario":"...","details":{{"common_mistake":"...","realistic_solution":"...","napkin_math":"..."}}}}"""
|
||
|
||
try:
|
||
r = subprocess.run(
|
||
["gemini", "-m", MODEL, "-o", "text"],
|
||
input=prompt, capture_output=True, text=True, timeout=90,
|
||
)
|
||
if r.returncode != 0:
|
||
return None
|
||
text = r.stdout.strip()
|
||
if text.startswith("```"):
|
||
text = re.sub(r"^```\w*\n?", "", text)
|
||
text = re.sub(r"\n?```$", "", text)
|
||
q = json.loads(text.strip())
|
||
q["id"] = f"{job['track']}-fill-{idx:05d}"
|
||
q["scope"] = ""
|
||
q["validated"] = False
|
||
q["validation_status"] = "pending"
|
||
q["validation_issues"] = []
|
||
q["validation_model"] = None
|
||
q["validation_date"] = None
|
||
q["chain_ids"] = None
|
||
q["chain_positions"] = None
|
||
return q
|
||
except:
|
||
return None
|
||
|
||
|
||
def main():
|
||
parser = argparse.ArgumentParser()
|
||
parser.add_argument("--workers", type=int, default=40)
|
||
parser.add_argument("--output", default="/tmp/batch_all.json")
|
||
parser.add_argument("--jobs", default="/tmp/staffml_jobs.json")
|
||
parser.add_argument("--limit", type=int, default=0, help="Limit jobs (0=all)")
|
||
args = parser.parse_args()
|
||
|
||
jobs = json.load(open(args.jobs))
|
||
if args.limit:
|
||
jobs = jobs[:args.limit]
|
||
|
||
print(f"Jobs: {len(jobs)}, Workers: {args.workers}, Model: {MODEL}")
|
||
|
||
generated = []
|
||
errors = 0
|
||
start = time.time()
|
||
|
||
with ThreadPoolExecutor(max_workers=args.workers) as ex:
|
||
futs = {ex.submit(gen_one, i, j): i for i, j in enumerate(jobs)}
|
||
for f in as_completed(futs):
|
||
q = f.result()
|
||
if q:
|
||
generated.append(q)
|
||
else:
|
||
errors += 1
|
||
done = len(generated) + errors
|
||
if done % 100 == 0:
|
||
elapsed = time.time() - start
|
||
rate = done / elapsed if elapsed > 0 else 0
|
||
print(f" [{done}/{len(jobs)}] gen={len(generated)} err={errors} ({rate:.1f}/s)")
|
||
|
||
json.dump(generated, open(args.output, "w"), indent=2, ensure_ascii=False)
|
||
elapsed = time.time() - start
|
||
print(f"\nDone: {len(generated)} generated, {errors} errors, {elapsed:.0f}s")
|
||
print(f"Saved to {args.output}")
|
||
|
||
|
||
if __name__ == "__main__":
|
||
main()
|