cs249r_book

github-starred/cs249r_book

Fork 0

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-07 02:03:55 -05:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Vijay Janapa Reddi	a74c98576e	Merge origin/dev into yaml-audit Sync the yaml-audit branch with the latest dev work since the previous sync (`5c5af75ed`). Brings in 73 commits including: - CI security fixes: postcss XSS bump, uuid bounds bump, codeql paths-ignore for vendored bundles, read-only token on staffml-validate-vault workflow - kits/ dark mode polish: code-block readability, dropdown contrast - vault-cli/: pre-commit ruff hook + 20 ruff fixes, all-contributors auto-credit workflow change to pull_request_target - dev's earlier merge of yaml-audit (`836d481b5`) carrying the pre-trailer-strip Phase 1/2/3 history; this merge harmonises that with the current trailer-clean yaml-audit tip - misc bug fixes (tinytorch perceptron seed, infra workflows, socratiq vite dev injector) Conflicts resolved (if any) preserve the yaml-audit-side authoritative state for vault/* files (we own those) and the dev-side authoritative state for .github/workflows/* and other shared infrastructure. # Conflicts: # .github/workflows/all-contributors-auto-credit.yml # .github/workflows/staffml-preview-dev.yml # interviews/staffml/src/data/corpus-summary.json # interviews/staffml/src/data/vault-manifest.json # interviews/staffml/tests/chain-and-vault-smoke.mjs # interviews/vault-cli/README.md # interviews/vault-cli/docs/CHAIN_ROADMAP.md # interviews/vault-cli/scripts/build_chains_with_gemini.py # interviews/vault-cli/scripts/generate_question_for_gap.py # interviews/vault-cli/scripts/merge_chain_passes.py # interviews/vault-cli/scripts/validate_drafts.py # interviews/vault-cli/src/vault_cli/legacy_export.py # interviews/vault-cli/tests/test_chain_validation.py # interviews/vault/.gitignore # interviews/vault/ARCHITECTURE.md # interviews/vault/chains.json # interviews/vault/id-registry.yaml # interviews/vault/questions/edge/optimization/edge-2536.yaml # interviews/vault/questions/mobile/deployment/mobile-2147.yaml # tinytorch/src/03_layers/03_layers.py	2026-05-02 11:06:43 -04:00
Vijay Janapa Reddi	2b3cf5e1da	chore(vault): consolidate AI pipeline artifacts under _pipeline/ Establishes one ignored subdirectory for ALL intermediate outputs of LLM-driven tooling (chain proposals, gap detection, draft scorecards, audit traces). Single gitignore rule: /_pipeline/. Convention is documented in interviews/vault/README.md under "Pipeline artifacts" — it's a real project layout convention, not AI-specific config. Path migration: interviews/vault/chains.proposed.json → _pipeline/chains.proposed.json interviews/vault/gaps.proposed.json → _pipeline/gaps.proposed.json interviews/vault/draft-validation-scorecard.json → _pipeline/draft-validation-scorecard.json interviews/vault/audit-runs/ → _pipeline/runs/ 8 scripts updated to define a PIPELINE_DIR constant and route default outputs through it: build_chains_with_gemini.py, apply_proposed_chains.py, merge_chain_passes.py, validate_drafts.py, audit_chains_with_gemini.py, generate_question_for_gap.py, summarize_proposed_chains.py, promote_drafts.py. Forward-looking docs (README.md chain-pipeline section + CHAIN_ROADMAP.md resume instructions + state snapshot) updated to reference the new paths. Historical Progress Log entries left as-is — they accurately describe what was committed at the time. Drive-by .gitignore fixes (both used full repo-relative paths under package-local .gitignore files, which never matched): interviews/vault-cli/.gitignore: scripts/.calibration_cache/ interviews/vault/.gitignore: /embeddings.npz Validation: - vault check --strict: 10,705 loaded, 0 invariant failures - pytest interviews/vault-cli/tests/: 74/74 - audit --dry-run: paths resolve correctly to _pipeline/runs/<ts>/ No durable corpus content moves. chains.json (live registry), id-registry.yaml, questions/, etc. all stay where they were.	2026-05-02 09:04:55 -04:00
Vijay Janapa Reddi	66c10e6f2b	feat(vault-cli): audit_chains_with_gemini.py — independent pipeline audit Single-driver script that runs an independent Gemini audit over the Phase 1-3 chain pipeline output. Designed as a complementary check to the pipeline's own validation gates (Pydantic schema, embedding cosine, multiple LLM judges) — runs an INDEPENDENT model pass over what would otherwise be human-spot-check territory. Categories (5 audit + 1 synthesis call = ~18 total calls, well under the 250/day Pro cap): 1. drafts 4 Phase 3 promoted drafts: independent quality gate (fabrication, level fit, answer correctness, scenario realism — failure modes the existing judges miss) 2. secondary 100-chain sample of tier=secondary chains 3. delta_zero All 55 Δ=0 chains (highest-risk lenient additions) — verifies the "shared scenario" claim per-pair 4. primary 100-chain sample of tier=primary chains (regression check on strict-pass quality) 5. gaps 50-gap sample with the two between-questions in full (real bridge vs hallucination) 6. synthesis 1 wrap-up call → AUDIT_REPORT.md A previously-planned tier_compare category was dropped: 0 buckets carry both primary and secondary chains (the lenient sweep was scoped to uncovered buckets, by definition disjoint). Per-tier quality is inferred from categories 2 and 4 by the synthesis call. Per-call target: ~80K input tokens (320K char prompts) — the attention sweet spot. Chain payloads at ~2-3K chars each pack ~50 chains into one such prompt. Outputs land in interviews/vault/audit-runs/<UTC-timestamp>/ config.json — what was sampled, with seed for reproducibility 0N_<category>.json — per-call prompt-char count, IDs, raw response And one human-readable rollup at interviews/vault/AUDIT_REPORT.md. Modes: --dry-run (plan only), --only <cat>, --skip <cat,...>, --seed (for reproducible re-runs). Findings only — never edits chains.json or any question YAML. Issues surfaced for human review.	2026-05-01 17:38:00 -04:00

Vijay Janapa Reddi

a74c98576e

Merge origin/dev into yaml-audit

Sync the yaml-audit branch with the latest dev work since the previous
sync (5c5af75ed). Brings in 73 commits including:

  - CI security fixes: postcss XSS bump, uuid bounds bump, codeql
    paths-ignore for vendored bundles, read-only token on
    staffml-validate-vault workflow
  - kits/ dark mode polish: code-block readability, dropdown contrast
  - vault-cli/: pre-commit ruff hook + 20 ruff fixes, all-contributors
    auto-credit workflow change to pull_request_target
  - dev's earlier merge of yaml-audit (836d481b5) carrying the
    pre-trailer-strip Phase 1/2/3 history; this merge harmonises that
    with the current trailer-clean yaml-audit tip
  - misc bug fixes (tinytorch perceptron seed, infra workflows,
    socratiq vite dev injector)

Conflicts resolved (if any) preserve the yaml-audit-side authoritative
state for vault/* files (we own those) and the dev-side authoritative
state for .github/workflows/* and other shared infrastructure.

# Conflicts:
#	.github/workflows/all-contributors-auto-credit.yml
#	.github/workflows/staffml-preview-dev.yml
#	interviews/staffml/src/data/corpus-summary.json
#	interviews/staffml/src/data/vault-manifest.json
#	interviews/staffml/tests/chain-and-vault-smoke.mjs
#	interviews/vault-cli/README.md
#	interviews/vault-cli/docs/CHAIN_ROADMAP.md
#	interviews/vault-cli/scripts/build_chains_with_gemini.py
#	interviews/vault-cli/scripts/generate_question_for_gap.py
#	interviews/vault-cli/scripts/merge_chain_passes.py
#	interviews/vault-cli/scripts/validate_drafts.py
#	interviews/vault-cli/src/vault_cli/legacy_export.py
#	interviews/vault-cli/tests/test_chain_validation.py
#	interviews/vault/.gitignore
#	interviews/vault/ARCHITECTURE.md
#	interviews/vault/chains.json
#	interviews/vault/id-registry.yaml
#	interviews/vault/questions/edge/optimization/edge-2536.yaml
#	interviews/vault/questions/mobile/deployment/mobile-2147.yaml
#	tinytorch/src/03_layers/03_layers.py

2026-05-02 11:06:43 -04:00

Vijay Janapa Reddi

2b3cf5e1da

chore(vault): consolidate AI pipeline artifacts under _pipeline/

Establishes one ignored subdirectory for ALL intermediate outputs of
LLM-driven tooling (chain proposals, gap detection, draft scorecards,
audit traces). Single gitignore rule: /_pipeline/.

Convention is documented in interviews/vault/README.md under "Pipeline
artifacts" — it's a real project layout convention, not AI-specific
config.

Path migration:
  interviews/vault/chains.proposed*.json
                  → _pipeline/chains.proposed*.json
  interviews/vault/gaps.proposed*.json
                  → _pipeline/gaps.proposed*.json
  interviews/vault/draft-validation-scorecard.json
                  → _pipeline/draft-validation-scorecard.json
  interviews/vault/audit-runs/
                  → _pipeline/runs/

8 scripts updated to define a PIPELINE_DIR constant and route default
outputs through it: build_chains_with_gemini.py,
apply_proposed_chains.py, merge_chain_passes.py, validate_drafts.py,
audit_chains_with_gemini.py, generate_question_for_gap.py,
summarize_proposed_chains.py, promote_drafts.py.

Forward-looking docs (README.md chain-pipeline section + CHAIN_ROADMAP.md
resume instructions + state snapshot) updated to reference the new
paths. Historical Progress Log entries left as-is — they accurately
describe what was committed at the time.

Drive-by .gitignore fixes (both used full repo-relative paths under
package-local .gitignore files, which never matched):
  interviews/vault-cli/.gitignore: scripts/.calibration_cache/
  interviews/vault/.gitignore:     /embeddings.npz

Validation:
  - vault check --strict: 10,705 loaded, 0 invariant failures
  - pytest interviews/vault-cli/tests/: 74/74
  - audit --dry-run: paths resolve correctly to _pipeline/runs/<ts>/

No durable corpus content moves. chains.json (live registry),
id-registry.yaml, questions/, etc. all stay where they were.

2026-05-02 09:04:55 -04:00

Vijay Janapa Reddi

66c10e6f2b

feat(vault-cli): audit_chains_with_gemini.py — independent pipeline audit

Single-driver script that runs an independent Gemini audit over the
Phase 1-3 chain pipeline output. Designed as a complementary check to
the pipeline's own validation gates (Pydantic schema, embedding cosine,
multiple LLM judges) — runs an INDEPENDENT model pass over what would
otherwise be human-spot-check territory.

Categories (5 audit + 1 synthesis call = ~18 total calls, well under
the 250/day Pro cap):

  1. drafts        4 Phase 3 promoted drafts: independent quality gate
                   (fabrication, level fit, answer correctness, scenario
                   realism — failure modes the existing judges miss)
  2. secondary     100-chain sample of tier=secondary chains
  3. delta_zero    All 55 Δ=0 chains (highest-risk lenient additions)
                   — verifies the "shared scenario" claim per-pair
  4. primary       100-chain sample of tier=primary chains (regression
                   check on strict-pass quality)
  5. gaps          50-gap sample with the two between-questions in full
                   (real bridge vs hallucination)
  6. synthesis     1 wrap-up call → AUDIT_REPORT.md

A previously-planned tier_compare category was dropped: 0 buckets
carry both primary and secondary chains (the lenient sweep was scoped
to uncovered buckets, by definition disjoint). Per-tier quality is
inferred from categories 2 and 4 by the synthesis call.

Per-call target: ~80K input tokens (320K char prompts) — the attention
sweet spot. Chain payloads at ~2-3K chars each pack ~50 chains into
one such prompt.

Outputs land in interviews/vault/audit-runs/<UTC-timestamp>/
  config.json            — what was sampled, with seed for reproducibility
  0N_<category>.json     — per-call prompt-char count, IDs, raw response
And one human-readable rollup at interviews/vault/AUDIT_REPORT.md.

Modes: --dry-run (plan only), --only <cat>, --skip <cat,...>,
--seed (for reproducible re-runs).

Findings only — never edits chains.json or any question YAML. Issues
surfaced for human review.

2026-05-01 17:38:00 -04:00

3 Commits