cs249r_book

github-starred/cs249r_book

Fork 0

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-07 02:03:55 -05:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Vijay Janapa Reddi	a74c98576e	Merge origin/dev into yaml-audit Sync the yaml-audit branch with the latest dev work since the previous sync (`5c5af75ed`). Brings in 73 commits including: - CI security fixes: postcss XSS bump, uuid bounds bump, codeql paths-ignore for vendored bundles, read-only token on staffml-validate-vault workflow - kits/ dark mode polish: code-block readability, dropdown contrast - vault-cli/: pre-commit ruff hook + 20 ruff fixes, all-contributors auto-credit workflow change to pull_request_target - dev's earlier merge of yaml-audit (`836d481b5`) carrying the pre-trailer-strip Phase 1/2/3 history; this merge harmonises that with the current trailer-clean yaml-audit tip - misc bug fixes (tinytorch perceptron seed, infra workflows, socratiq vite dev injector) Conflicts resolved (if any) preserve the yaml-audit-side authoritative state for vault/* files (we own those) and the dev-side authoritative state for .github/workflows/* and other shared infrastructure. # Conflicts: # .github/workflows/all-contributors-auto-credit.yml # .github/workflows/staffml-preview-dev.yml # interviews/staffml/src/data/corpus-summary.json # interviews/staffml/src/data/vault-manifest.json # interviews/staffml/tests/chain-and-vault-smoke.mjs # interviews/vault-cli/README.md # interviews/vault-cli/docs/CHAIN_ROADMAP.md # interviews/vault-cli/scripts/build_chains_with_gemini.py # interviews/vault-cli/scripts/generate_question_for_gap.py # interviews/vault-cli/scripts/merge_chain_passes.py # interviews/vault-cli/scripts/validate_drafts.py # interviews/vault-cli/src/vault_cli/legacy_export.py # interviews/vault-cli/tests/test_chain_validation.py # interviews/vault/.gitignore # interviews/vault/ARCHITECTURE.md # interviews/vault/chains.json # interviews/vault/id-registry.yaml # interviews/vault/questions/edge/optimization/edge-2536.yaml # interviews/vault/questions/mobile/deployment/mobile-2147.yaml # tinytorch/src/03_layers/03_layers.py	2026-05-02 11:06:43 -04:00
Vijay Janapa Reddi	e43ff34719	feat(vault-cli): chain audit + rescue suggestions with embedding similarity Adds two subcommands and supporting modules: vault chains audit Reports chain health: orphans, position-drift (gaps from filtered members), stale-registry, intra-chain cosine distribution, weakest chains list. Embedding-aware via --no-embeddings escape hatch. vault chains suggest For each orphan singleton, ranks rescue candidates within the same (track, topic) bucket. Hybrid scoring: HARD filter: level_delta in {0, 1, 2} (matches 92% of observed chain edges across the corpus) SOFT rank: embedding cosine + delta=1 priority Bands: strong-merge / review-merge / below-threshold Embeddings: bge-small-en-v1.5 (BAAI). Calibrated via scripts/calibrate_chain_embeddings.py against the 726 healthy chains. Empirical findings (in script header docstring): - bge-small precision@1 = 0.283, recall@3 = 0.447 - bge-large gains only +0.013 P@1 at 7x embedding time — not worth it - Same-bucket questions are inherently close (μ_pos=0.785, μ_neg=0.757); so this is suggestion-only, never auto-apply. Cross-encoder rerank experiment script included for future research (BAAI/bge-reranker-base) — current run OOM'd on 16GB; deferred. Embedding cache (.npz) is gitignored — reproducible from source.	2026-04-29 19:00:09 -04:00

Vijay Janapa Reddi

a74c98576e

Merge origin/dev into yaml-audit

Sync the yaml-audit branch with the latest dev work since the previous
sync (5c5af75ed). Brings in 73 commits including:

  - CI security fixes: postcss XSS bump, uuid bounds bump, codeql
    paths-ignore for vendored bundles, read-only token on
    staffml-validate-vault workflow
  - kits/ dark mode polish: code-block readability, dropdown contrast
  - vault-cli/: pre-commit ruff hook + 20 ruff fixes, all-contributors
    auto-credit workflow change to pull_request_target
  - dev's earlier merge of yaml-audit (836d481b5) carrying the
    pre-trailer-strip Phase 1/2/3 history; this merge harmonises that
    with the current trailer-clean yaml-audit tip
  - misc bug fixes (tinytorch perceptron seed, infra workflows,
    socratiq vite dev injector)

Conflicts resolved (if any) preserve the yaml-audit-side authoritative
state for vault/* files (we own those) and the dev-side authoritative
state for .github/workflows/* and other shared infrastructure.

# Conflicts:
#	.github/workflows/all-contributors-auto-credit.yml
#	.github/workflows/staffml-preview-dev.yml
#	interviews/staffml/src/data/corpus-summary.json
#	interviews/staffml/src/data/vault-manifest.json
#	interviews/staffml/tests/chain-and-vault-smoke.mjs
#	interviews/vault-cli/README.md
#	interviews/vault-cli/docs/CHAIN_ROADMAP.md
#	interviews/vault-cli/scripts/build_chains_with_gemini.py
#	interviews/vault-cli/scripts/generate_question_for_gap.py
#	interviews/vault-cli/scripts/merge_chain_passes.py
#	interviews/vault-cli/scripts/validate_drafts.py
#	interviews/vault-cli/src/vault_cli/legacy_export.py
#	interviews/vault-cli/tests/test_chain_validation.py
#	interviews/vault/.gitignore
#	interviews/vault/ARCHITECTURE.md
#	interviews/vault/chains.json
#	interviews/vault/id-registry.yaml
#	interviews/vault/questions/edge/optimization/edge-2536.yaml
#	interviews/vault/questions/mobile/deployment/mobile-2147.yaml
#	tinytorch/src/03_layers/03_layers.py

2026-05-02 11:06:43 -04:00

Vijay Janapa Reddi

e43ff34719

feat(vault-cli): chain audit + rescue suggestions with embedding similarity

Adds two subcommands and supporting modules:

  vault chains audit
    Reports chain health: orphans, position-drift (gaps from filtered
    members), stale-registry, intra-chain cosine distribution, weakest
    chains list. Embedding-aware via --no-embeddings escape hatch.

  vault chains suggest
    For each orphan singleton, ranks rescue candidates within the same
    (track, topic) bucket. Hybrid scoring:
      HARD filter: level_delta in {0, 1, 2} (matches 92% of observed
                   chain edges across the corpus)
      SOFT rank:   embedding cosine + delta=1 priority
      Bands:       strong-merge / review-merge / below-threshold

Embeddings: bge-small-en-v1.5 (BAAI). Calibrated via
scripts/calibrate_chain_embeddings.py against the 726 healthy chains.
Empirical findings (in script header docstring):
  - bge-small precision@1 = 0.283, recall@3 = 0.447
  - bge-large gains only +0.013 P@1 at 7x embedding time — not worth it
  - Same-bucket questions are inherently close (μ_pos=0.785, μ_neg=0.757);
    so this is suggestion-only, never auto-apply.

Cross-encoder rerank experiment script included for future research
(BAAI/bge-reranker-base) — current run OOM'd on 16GB; deferred.

Embedding cache (.npz) is gitignored — reproducible from source.

2026-04-29 19:00:09 -04:00

2 Commits