cs249r_book

github-starred/cs249r_book

Fork 0

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-08 09:57:21 -05:00

Commit Graph

Author	SHA1	Message	Date
Vijay Janapa Reddi	3f0773706f	chore(vault): restore 6 unique-capability scripts as preserved-for-adaptation references The Phase 0 cleanup removed 18 scripts as deprecated, but 6 of them have unique-capability patterns not yet covered by the modern tooling. Restoring them as reference patterns, not active scripts. What's restored and why: gemini_backfill_question.py Idempotent corpus-walk + Gemini batch + thread-pool + JSON YAML round-trip. The "fix one field across thousands of YAMLs" pattern. To be mined in CORPUS_HARDENING_PLAN.md Phase 5. gpt_backfill_question.py OpenAI variant of the above. Cross-provider template. gemini_cli_generate_questions.py (35K) BATCHED generation: 12 cells per call with balanced track × area × zone × level round-robin. `vault generate` does NOT batch — it calls once per question. This script's batching pattern is what we want when generating > 100 questions in bulk. generate.py (30K) Coverage-survey-driven generation engine: surveys the corpus, finds empty cells, generates to fill the emptiest first, stops when saturated. `vault generate` lacks this auto-balance loop. gemini_fix_errors.py Batch error-fixer with hardware-reference grounding (V100 / A100 / H100 / B200 / T4 specs as ground-truth context). To be mined for audit_corpus_batched.py --propose-fixes in Phase 5. deep_verify.py Claude Opus + extended thinking; SHOWS ITS WORK on every napkin-math claim. Useful as a tiebreaker on borderline math findings from the lightweight audit. Each restored file has a 5-line STATUS comment block at the top documenting what to adapt before running. DEPRECATED.md is restructured to make the three categories explicit (removed / preserved-for-adaptation / active-migration), and adds an adaptation checklist that applies to all preserved scripts (replace corpus.json loading, verify SDK pins, update output paths, re-validate prompts, sample first). Validation: vault check --strict — 10,711 loaded, 0 invariant failures pytest — 74/74 ruff — clean	2026-05-03 07:50:28 -04:00
Vijay Janapa Reddi	56d3ed1551	chore(vault): remove 18 deprecated scripts per CORPUS_HARDENING_PLAN.md Phase 0 All 18 scripts pre-date the YAML-as-source-of-truth migration (ARCHITECTURE.md v2.x, Phase 1) and are listed in DEPRECATED.md's replaced-by table. The corpus.json they ran against is itself now a build artifact (gitignored, regenerated by `vault build --local-json`). Removed top-level (13): build_corpus.py → vault build (walks YAML, emits vault.db) export_to_staffml.py → vault build --local-json extract_taxonomy.py → vault/taxonomy.yaml deep_verify.py → audit_chains_with_gemini.py + validate_drafts.py gemini_*.py × 6 → Phase-7 vault generate / batched audit pipeline gpt_backfill_question.py gate.py → obsolete after schema v1.0 generate.py → vault generate Removed archive/ (5): expand_tracks.py, fill_zone_gaps.py, fill_gaps.sh, final_balance.sh, README.md (now-orphan). DEPRECATED.md updated: replaced-by table reorganized as a removal log for git-archaeology, with a note that historical implementations are findable via `git log --diff-filter=D`. Validation: vault check --strict — 10,711 loaded, 0 invariant failures pytest interviews/vault-cli/tests/ — 74/74 ruff check interviews/vault-cli — clean This is Phase 0 of CORPUS_HARDENING_PLAN.md.	2026-05-03 07:44:13 -04:00
Vijay Janapa Reddi	9955a76b92	feat(staffml): deep verification + mock NeurIPS reviews + paper improvements Deep verification: 237-question stratified sample, 4.2% error rate found. All 10 errors fixed (unit confusion, arithmetic, conceptual misapplication). 96 physics violations removed (impossible topic×track pairs). Extended invariant checks added (applicability matrix enforcement). Paper improvements from mock NeurIPS review feedback: - Bloom critique softened ("complements" not "departs from") - LLM generation transparency (95% ratio + 4.2% error rate disclosed) - Scope explicitly limited to technical systems reasoning - H100 specs corrected (989 TFLOPS, not 495) - Track percentages reference table instead of hardcoding - Figure captions use macros for consistency New topics with questions: software-portability (50), comm-compute-overlap (50). Phase metadata reclassified (42.5% inference, 37.7% both, 19.9% training).	2026-04-02 07:28:41 -04:00

Author

SHA1

Message

Date

Vijay Janapa Reddi

3f0773706f

chore(vault): restore 6 unique-capability scripts as preserved-for-adaptation references

The Phase 0 cleanup removed 18 scripts as deprecated, but 6 of them have
unique-capability patterns not yet covered by the modern tooling. Restoring
them as reference patterns, not active scripts.

What's restored and why:

  gemini_backfill_question.py
    Idempotent corpus-walk + Gemini batch + thread-pool + JSON YAML
    round-trip. The "fix one field across thousands of YAMLs" pattern.
    To be mined in CORPUS_HARDENING_PLAN.md Phase 5.

  gpt_backfill_question.py
    OpenAI variant of the above. Cross-provider template.

  gemini_cli_generate_questions.py (35K)
    BATCHED generation: 12 cells per call with balanced track × area ×
    zone × level round-robin. `vault generate` does NOT batch — it calls
    once per question. This script's batching pattern is what we want
    when generating > 100 questions in bulk.

  generate.py (30K)
    Coverage-survey-driven generation engine: surveys the corpus, finds
    empty cells, generates to fill the emptiest first, stops when
    saturated. `vault generate` lacks this auto-balance loop.

  gemini_fix_errors.py
    Batch error-fixer with hardware-reference grounding (V100 / A100 /
    H100 / B200 / T4 specs as ground-truth context). To be mined for
    audit_corpus_batched.py --propose-fixes in Phase 5.

  deep_verify.py
    Claude Opus + extended thinking; SHOWS ITS WORK on every napkin-math
    claim. Useful as a tiebreaker on borderline math findings from the
    lightweight audit.

Each restored file has a 5-line STATUS comment block at the top
documenting what to adapt before running. DEPRECATED.md is restructured
to make the three categories explicit (removed / preserved-for-adaptation
/ active-migration), and adds an adaptation checklist that applies to
all preserved scripts (replace corpus.json loading, verify SDK pins,
update output paths, re-validate prompts, sample first).

Validation:
  vault check --strict — 10,711 loaded, 0 invariant failures
  pytest — 74/74
  ruff — clean

2026-05-03 07:50:28 -04:00

Vijay Janapa Reddi

56d3ed1551

chore(vault): remove 18 deprecated scripts per CORPUS_HARDENING_PLAN.md Phase 0

All 18 scripts pre-date the YAML-as-source-of-truth migration
(ARCHITECTURE.md v2.x, Phase 1) and are listed in DEPRECATED.md's
replaced-by table. The corpus.json they ran against is itself now a
build artifact (gitignored, regenerated by `vault build --local-json`).

Removed top-level (13):
  build_corpus.py        → vault build (walks YAML, emits vault.db)
  export_to_staffml.py   → vault build --local-json
  extract_taxonomy.py    → vault/taxonomy.yaml
  deep_verify.py         → audit_chains_with_gemini.py + validate_drafts.py
  gemini_*.py × 6        → Phase-7 vault generate / batched audit pipeline
  gpt_backfill_question.py
  gate.py                → obsolete after schema v1.0
  generate.py            → vault generate

Removed archive/ (5):
  expand_tracks.py, fill_zone_gaps.py, fill_gaps.sh, final_balance.sh,
  README.md (now-orphan).

DEPRECATED.md updated: replaced-by table reorganized as a removal log
for git-archaeology, with a note that historical implementations are
findable via `git log --diff-filter=D`.

Validation:
  vault check --strict — 10,711 loaded, 0 invariant failures
  pytest interviews/vault-cli/tests/ — 74/74
  ruff check interviews/vault-cli — clean

This is Phase 0 of CORPUS_HARDENING_PLAN.md.

2026-05-03 07:44:13 -04:00

Vijay Janapa Reddi

9955a76b92

feat(staffml): deep verification + mock NeurIPS reviews + paper improvements

Deep verification: 237-question stratified sample, 4.2% error rate found.
All 10 errors fixed (unit confusion, arithmetic, conceptual misapplication).
96 physics violations removed (impossible topic×track pairs).
Extended invariant checks added (applicability matrix enforcement).

Paper improvements from mock NeurIPS review feedback:
- Bloom critique softened ("complements" not "departs from")
- LLM generation transparency (95% ratio + 4.2% error rate disclosed)
- Scope explicitly limited to technical systems reasoning
- H100 specs corrected (989 TFLOPS, not 495)
- Track percentages reference table instead of hardcoding
- Figure captions use macros for consistency

New topics with questions: software-portability (50), comm-compute-overlap (50).
Phase metadata reclassified (42.5% inference, 37.7% both, 19.9% training).

2026-04-02 07:28:41 -04:00

3 Commits