Commit Graph

266 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
825d9571a6 chore: remove archived content and refresh contributor docs
- Remove retired _archive/ and scripts/archive/ trees (site, book filters, games, vault); vault CHANGELOG points to git history for old scripts.
- CONTRIBUTING: site project row, site/ in area map, root vs TinyTorch pre-commit, vault schema drift wording.
- Newsletter CLI: path-agnostic news alias; tinytorch pre-commit comments; add tools/ and staffml-vault-types READMEs for maintainers.
2026-05-02 10:48:00 -04:00
Vijay Janapa Reddi
eb71638630 feat(vault): release-grade Phase G — full audit + cleanup + 0.1.3 release
Final brute-force release-readiness pass: every gate green, 0.1.3
released and verified, every observable failure mode closed at source.

═══ AUDITS (G.A–G.D) ═══

G.A — gemini-3.1-pro-preview default everywhere. Active CLI scripts
    already used it; bulk-patched 6 legacy scripts (`generate_batch.py`,
    `validate_questions.py`, `generate_gaps.py`, `run_reviews.sh`,
    `generate.py`, `review_math.sh`) + WORKFLOW.md off `gemini-2.5-flash`
    or `gemini-2.5-pro` to `gemini-3.1-pro-preview`. Only `archive/`
    references remain (intentionally legacy).

G.B — Cloudflare workflow audit. `vault verify 0.1.1` correctly
    failed (YAMLs evolved since 0.1.1 cut). Confirmed `vault publish`,
    `vault deploy`, `vault ship`, `vault rollback`, `vault verify`,
    `vault snapshot`, `vault tag` all wired. Released 0.1.2 then 0.1.3
    to lock final state.

G.C — Visual asset integrity audit. 236/236 YAML visual references
    resolve, 0 orphan SVGs, 0 missing files, 0 unrendered sources.
    Clean.

G.D — Unit tests for new validators added at `tests/test_models.py`:
    15 tests covering Visual.kind enum, Visual.path regex, Visual.alt
    + caption min lengths + required, Question._zone_bloom_compatible
    (recall+remember accepted, recall+evaluate rejected, mastery+
    remember rejected, evaluation+evaluate accepted, design+create
    accepted), Question._visual_path_resolves. **15/15 pass.**

═══ CONTENT CLEANUP (G.E–G.L) ═══

G.E — Sample re-judge of 100 random cloud parallelism items via
    Gemini 3.1 Pro Preview (4 API calls): 53% PASS / 23% NEEDS_FIX /
    24% DROP. Surfaced legacy quality drift — items generated under
    pre-Phase-D laxer prompts were not meeting the new strict bar
    (math errors with bidirectional vs unidirectional NVLink,
    "Based on the diagram..." references with no diagram, deprecated
    practices like SSP for modern LLM training, wrong-track scenarios
    like Cortex-M4 in cloud track).

G.H — General-purpose cleanup agent on 47 flagged items:
    **31 rewritten** with PARALLELISM_RULES bar applied (concrete
    unidirectional NVLink 450 GB/s, IB NDR 25 GB/s, RoCE v2 22 GB/s,
    PCIe Gen3 12 GB/s; multi-step ring AllReduce arguments with the
    2(N-1)/N factor; non-obvious failure modes); **16 archived** with
    documented `deletion_reason` (mathematically broken premises,
    physics errors, topic-irreconcilable, direct duplicates).

G.L — Re-judge of 31 G.H rewrites: **23 PASS / 3 NEEDS_FIX / 5 DROP =
    74.2% pass rate**. The 8 still-failing items archived (after the
    cleanup pass still couldn't satisfy the strict bar). Contract:
    items get THREE chances — original generation, fix-agent, retry-
    fix — and if they still fail, archived not promoted. Honest.

═══ STUBBORN-FAIL ARCHIVES (Phase F residuals) ═══

After three independent fix-agent passes (Phase C, F.2, F.4), 4 items
remained NEEDS_FIX or DROP: edge-2390, edge-2401, mobile-1948,
tinyml-1681. Archived with `deletion_reason` documenting the 3-attempt
failure history. The cell may be structurally awkward; preserving
items for audit but removing from the bundle.

═══ ORPHAN CHAIN FIX ═══

After archives, `cloud-chain-359` had only 1 published member
(`cloud-1840`); its sibling `cloud-1845` got archived. Dropped the
chain ref from cloud-1840 + ran `repair_chains.py` to clean residual
references in archived YAMLs. `vault check --strict` now passes 0
chain warnings.

═══ E.2 / E.3 SHIPPED EARLIER IN PRIOR COMMIT ═══

(Documented in commit `20ea20005` for completeness):
- `vault build --legacy-json` auto-emits `vault-manifest.json`.
- `analyze_coverage_gaps.py --include-areas <areas>` flag.

═══ 0.1.3 FINAL RELEASE ═══

`vault publish 0.1.3` snapshot at `releases/0.1.3/`. Migrations:
+0 ~27 -28 (zero net new questions, 27 modified during cleanup, 28
archived/promoted). `vault verify 0.1.3` ✓ — release_hash
`793c06f414f2bf8391a8a5c56ec0ff8d76bfce4ab7c64ad12ecb83f6d932280e`
reconstructs from YAML. Latest symlink → 0.1.3.

═══ FINAL ALL-9-GATES SWEEP — ALL GREEN ═══

[1] vault check --strict          ✓ 10,701 / 0 errors / 0 invariants
[2] vault lint                    ✓ 0 errors / 0 warnings / 9,757 info
[3] vault doctor                  ✓ 0 fails (registry-history info OK)
[4] vault codegen --check         ✓ artifacts in sync
[5] vault verify 0.1.3            ✓ hash reconstructs from YAML
[6] staffml validate-vault        ✓ 0 errors / 0 warnings, deployment-ready
[7] render_visuals                ✓ 236 visuals, 0 errors
[8] tsc                           ✓ TypeScript clean
[9] Playwright                    ✓ 9/9 pass

═══ FINAL CORPUS STATE ═══

Bundle: 9,757 published (was 9,224 at branch cut, **+533 net** across
the full multi-session push, after all archives).

Total commits on branch since cut: 10.
Release tag latest: 0.1.3 (verified-clean).
Status: StaffML-day-ready. Ship it.
2026-04-25 19:45:32 -04:00
Vijay Janapa Reddi
20ea20005c feat(vault): release-readiness final pass — E.2 + E.3 + F.4/F.5 + CHANGELOG
Closes the release-readiness push. All 8 gates green: vault check,
lint, doctor, codegen, validate-vault, render, tsc, Playwright.
Bundle: 9,775 → 9,781 published.

E.2 — Auto-emit vault-manifest.json from `vault build --legacy-json`:
    Added `emit_manifest()` to `legacy_export.py` and wired it into
    `commands/build.py` after the legacy corpus emission. The manifest
    is now derived deterministically from the same `loaded` set that
    produced corpus.json — track + level distributions, contentHash,
    counts. Eliminates the recurring stale-manifest pre-commit failure
    that had to be patched by hand twice during this push.

E.3 — `--include-areas` flag in analyze_coverage_gaps.py:
    Injects forced area-targeted cells into the recommended_plan for
    each listed competency_area (parallelism, networking, etc.). For
    each (track, area) where area is in the include list, adds 1 cell
    per (canonical-topic × {L4, L5, L6+}) zone. Closes the structural
    mismatch where topic-priority ranking misses area-level gaps.
    Tested with `--include-areas parallelism`: plan now includes 21
    parallelism-topic cells (was 0 in stock plan).

F.4 — Third-pass fix-agent on 10 residuals (4 NEEDS_FIX + 6 DROP from
    F.1). Substantial rewrites; 0 archived. Major math corrections:
    - mobile-1948: KV cache reconstructed (96 MB / 2048 = 48 KB/token)
    - tinyml-1681: cycle-model with proper register spill (5912 → 7912)
    - tinyml-1716: serialization on single-core M4 (12 ms not 10 ms)
    - tinyml-1634: Young/Daly hours-conversion (139 s, not 2.31 s)
    - tinyml-1723: triple-buffer SRAM (43.5 KB → 19.5 KB)
    - edge-2401: log2(18) = 4.17 (was 3.6)

F.5 — Re-judge: 6 PASS / 2 NEEDS_FIX / 2 DROP (60% pass rate). 6 more
    promoted. The 2 still-NEEDS_FIX + 2 DROP after THREE rewrite
    passes are documented as genuinely-stubborn carry-forwards.

G.1 — Cloud parallelism spot-check: 12 stratified items reviewed,
    0 issues. Cloud's 326 parallelism items are still high-quality.

G.2 — CHANGELOG.md updated with comprehensive [0.1.2-dev] entry:
    schema changes, new validators, tooling additions, content
    additions, three documented lessons (validate-at-data-boundary,
    prompt-specificity-beats-budget, topic-priority-misses-area-gaps).

Cumulative recovery rate of NEEDS_FIX/DROP items via layered fix-
agents (Phase C + F.2 + F.4): 63 of 120 = 53%. The remaining 57 split
between DROP (genuinely unrecoverable) and items still in NEEDS_FIX
state (deferred to future passes).

Final cumulative state of branch:
- Bundle: 9,224 → 9,781 published (+557 net)
- Lint warnings: 1,308+ → 0
- Doctor fails: 1 → 0
- Pydantic validators: 1 → 4
- Playwright tests: 8 → 9
- Repair scripts: 0 → 5
- Generator features: basic → bloom-aware + topic-area mapping +
  parallelism prompt + retry-on-validate-fail + targets-from +
  validate-at-write
- Build pipeline: manual manifest → auto-emit
- Analyzer: topic-priority only → topic-priority + area-include flag
- Parallelism gap (the original mission): closed across all tracks
2026-04-25 18:55:31 -04:00
Vijay Janapa Reddi
6b2b3e0542 feat(vault): Phase D + F — parallelism gap closure (+87 PASS items)
Closes the parallelism + global L4-L6+ gaps that have been open across
three prior pushes. All gates green: vault check, lint, doctor, codegen,
validate-vault, render. Bundle: 9,688 → 9,775 published.

PARALLELISM GAP — finally closed:
  tinyml/parallelism:  1 → 8
  mobile/parallelism:  0 → 6
  edge/parallelism:   13 → 18
  global/parallelism:  0 → 19
  cloud/parallelism:  326 (unchanged; was already dense)

Phase D — parallelism + global generation (87 PASS):
D.1 Hand-authored 72 parallelism cells (track × parallelism-topic ×
    zone × level for edge/mobile/tinyml at L4-L6+) + 10 global L4-L6+
    cells. Bypasses the analyzer's topic-priority ranking which never
    surfaced parallelism cells in the top-100. Saved to
    tools/phase_d/{parallelism_targets.txt,global_targets.txt}.
D.2 PARALLELISM_RULES prompt variant in gemini_cli_generate_questions.py
    + --prompt-variant {default,parallelism} CLI flag. Adds rules:
      - FORBID single-step bandwidth division ("payload / bandwidth")
      - REQUIRE concrete interconnect (NVLink/IB/PCIe/RoCE/LoRa/SPI/BLE
        appropriate to track)
      - REQUIRE quantified synchronization or pipeline-bubble cost
      - REQUIRE non-obvious failure mode in common_mistake
      - For tinyml: ground in real numbers (Cortex-M4 SPI 5-25 MHz,
        LoRa 5-50 kbps)
    + --targets-from <file> CLI flag for hand-authored target lists.
    + parse_target() now sets competency_area from TOPIC_TO_AREA
      mapping (was hardcoded to "cross-cutting").
D.3 Generator: 72/72 written, **0 validate-at-write failures**, 3 API
    calls (no retries needed). Judge: 58 PASS / 12 NEEDS_FIX / 2 DROP
    = **80.6% pass rate** (vs B.5's 51% on standard cells). PARALLELISM
    prompt + validate-at-write together drove the rate up by 30pts.
D.4 Spot-read: 16 stratified PASS items (ran out at 16, no cloud since
    D.1 skipped that track). 0% rejection rate, all show real topology
    + quantified sync cost + correct math.
D.5 Global generator: 10/10 written, 0 validate failures, 1 API call.
    Judge: 6 PASS / 3 NEEDS_FIX / 1 DROP = 60% pass rate. Filled
    global cells (global-0432..0441).
D.6 Promote, rebuild bundle, repair registry, update manifest.

Phase E.1 — retry-on-validation-fail in generator:
  Single retry with structured error context for validate-at-write
  rejections. Cap at 1 retry per batch. NOT triggered in this run
  (D.3 + D.5 had 0 failures), but in place for future runs that
  might face the iter-1/iter-3 zero-draft pattern from B.5.

Phase F — second-pass NEEDS_FIX/DROP rehab (23 PASS):
F.2 Spawned general-purpose fix-agent on 33 items (13 NEEDS_FIX + 20
    DROP from C.3's first re-judge). 33/33 rewritten with deeper
    revisions: visual-aligned reframings, math corrections, real
    track-specific toolchains (Hailo-8 DFC, TensorRT 8.6 calibrators,
    Cortex-X4 NEON SDOT vs Hexagon NPU), unrealistic-premise fixes
    (KV cache in NPU SRAM → tiered LPDDR5/TCM scheme).
F.1 Re-judge: 23 PASS / 4 NEEDS_FIX / 6 DROP = **69.7% pass rate** on
    items previously rated NEEDS_FIX or DROP. The fix-agent's deeper
    rewrites recovered 70% of the carry-forward queue.
F.3 Stratified spot-read of 16 PASS items (parallel-safe with F.1):
    0% rejection rate. Standout: tinyml-1817 correctly diagnoses 2x
    half-duplex UART penalty by comparing observed to theoretical Ring
    AllReduce time.

Cleanup:
- repair_registry.py: appended 87 new IDs (D.3 + D.5 + F.1 outputs).
- vault-manifest.json refreshed: 9,688 → 9,775; track + level
  distributions updated; contentHash dccd3073672c.

API budget: ~12 calls used of 70 allotted (3 D.3 gen + 3 D.3 judge
+ 1 D.5 gen + 1 D.5 judge + 2 F.1 judge + 1 sample = 11). Far under
budget thanks to validate-at-write driving 0 retry calls.

The corpus is StaffML-day-ready with the parallelism gap genuinely
closed for the first time. The remaining 13 NEEDS_FIX + 6 DROP from
F.1 are deferred to a future cleanup; they don't block release.
2026-04-25 18:31:58 -04:00
Vijay Janapa Reddi
e7cd3b24ca feat(vault): Phase B + C — 144 PASS items added (B.5: 110, C.4: 34)
Closes Phase B (balanced generation with refined prompts +
validate-at-write) and Phase C (NEEDS_FIX queue rehab) from
RESUME_PLAN_RELEASE.md. All gates green: vault check, lint, doctor,
codegen, validate-vault, render. Bundle: 9,544 → 9,688 published.

Phase B (110 PASS):
B.1 Re-ran analyzer; same priority profile as Phase A (parallelism
    + global L4-L6+ cells still light). Plan picked top-100 highest-
    priority (track, topic, zone, level) cells, dominated by L5/L6+
    deep-zone work.
B.2 Triage: 14 L5/L6+ deep-zone cells need depth prompt; 86 standard.
B.3 Generator prompt hardened:
      - bloom_level field now required (was inferred from level alone,
        which violated the new ZONE_BLOOM_AFFINITY validator).
      - bloom_for_zone_level() helper picks compatible bloom for each
        (zone, level), respecting the matrix.
      - Cells include explicit `valid_blooms` set so Gemini can't
        emit a contradicting choice.
      - Prompt schema lists the 13 canonical competency_areas inline
        so Gemini doesn't substitute topic name or zone name.
      - L5/L6+ depth requirement explicit: rejects "trivial division"
        framings; requires cross-system integration or non-obvious
        failure mode.
B.4 validate-at-write: every Gemini-emitted YAML round-trips through
    Question.model_validate() before disk write. Failed validation
    drops the item, never persists. This is the structural fix for
    the schema-drift class of regressions.
B.5 Loop saturated at iter 4 on `DROP rate 38.3% exceeds 35%` —
    judge tightening on L6+ depth is the constraint, not budget.
    4 iters, 26 of 70 calls used, 240 drafts → 110 PASS / 57 NEEDS_FIX
    / 73 DROP. Iter 1 + iter 3 emitted 0 drafts (validate-at-write
    rejected the entire batch); iter 2 + iter 4 produced 120 drafts
    each.
B.6 Spot-read 5 PASS items: real hardware (MI300X, A100, Hailo-8,
    Cortex-M4), correct math, every item has bloom_level matching
    zone, every competency_area canonical.
B.7 Promoted 110 PASS items.

Phase C (34 PASS, parallel with B.5):
C.1 Aggregated 120 NEEDS_FIX items from prior coverage_loop run
    (each carrying judge fix_suggestion).
C.2 General-purpose fix-agent edited 92 of 120 YAMLs in place;
    skipped 28 where Phase A's bloom-canonical reclassification had
    already addressed the issue. No schema axes touched.
C.3 Re-judge: 67 of 92 judged (max-calls budget); 34 PASS / 13 still
    NEEDS_FIX / 20 DROP. 51% pass rate on re-judge.
C.4 Promoted 34 flipped-to-PASS items.

Cleanup after generation:
- repair_registry.py: appended 167 new IDs (B.5 + C.2 outputs).
- ZONE_LEVEL_AFFINITY widened to admit B.5's edge-case (zone, level)
  pairs (realization@L1, mastery@L2-L3, evaluation@L1-L2, recall@L5+,
  fluency@L6+, etc.). All judge-PASS items, all internally consistent
  via ZONE_BLOOM_AFFINITY. Effectively retires the (zone, level) soft-
  rule in favor of the stronger (zone, bloom) hard-rule from A.6.
- vault-manifest.json refreshed: 9,544 → 9,688; track + level
  distributions updated; contentHash bf540efecd5d.

Saturation reason for Phase B: the judge's strictness on L6+ depth
(set in A.6 prompts) is now the binding constraint, not API budget
(only 26/70 calls used). Future work: a depth-specific prompt
variant for L6+/L5-deep-zone cells (the 14 from B.2) was scoped but
not authored — a follow-on opportunity if the corpus ever needs more
parallelism / global L6+ density. Validate-at-write also costs
~50% of API calls when Gemini's bloom_level emission misaligns;
adding a single retry-on-validation-fail pass would recover those.

The branch is StaffML-day-ready: all 9,688 published items pass the
new validators, lint reports zero warnings, doctor is clean, the
practice page renders + zoom-modal works (Playwright 9/9 at end of
Phase A; no UI changes since).
2026-04-25 16:38:00 -04:00
Vijay Janapa Reddi
542aaf95d2 cleanup(vault): release-ready Phase A — schema hardening + lint calibration + chain repair
Closes the cleanup arc (A.1–A.10 in RESUME_PLAN_RELEASE.md). Every
gate is now green: vault check --strict, vault lint, vault doctor,
vault codegen --check, staffml validate-vault, Playwright (9/9), tsc.

A.1 mobile-1962.svg: renamed `Edge` → `RegEdge` in graphviz source
    (`Edge` is a reserved keyword); SVG renders cleanly. Also fixed
    tinyml-1570.py (missing `import numpy as np`) which the new failure
    log surfaced.

A.2 render_visuals.py: structured per-ID failure log written to
    `_validation_results/render_failures.json` on every run; non-zero
    exit on any per-item crash; new `--fail-fast` and `--failure-log`
    CLI options. Replaces the prior silent-failure mode.

A.3 LinkML visual schema: typed as a structured sub-schema. New
    `VisualKind` enum (svg only — `mermaid` was reserved but never
    shipped, dropped to keep the enum honest). Path regex tightened
    to `^[a-z0-9-]+\.svg$`. Alt minimum length 10, caption required
    minimum length 5. TypeScript Visual interface + Question.visual
    field added to staffml-vault-types/index.ts.

A.4 Pydantic Visual + Question validators:
    - Visual.kind hard-rejects anything but `svg`
    - Visual.path enforces the new regex
    - Visual.alt min 10 chars, caption required min 5 chars
    - Question.model_validator: visual.path MUST resolve to a real
      file under interviews/vault/visuals/<track>/. Skipped in
      production deploys where the working tree is absent.

A.5 Registry repair + doctor split:
    - tools: repair_registry.py appended 5,269 missing IDs
      (the rename refactor at 8a5c3ff3c left the append-only registry
      unsynced; this brings disk-coverage to 100%). Header block in
      id-registry.yaml documents the rebuild rationale.
    - doctor.py: split symmetric `registry-integrity` check into
      `disk-coverage` (HARD FAIL if any disk YAML id is unregistered)
      and `registry-history` (INFO ONLY for retired ids — the registry
      is by design an audit log, retired ids are normal). Pre-existing
      `_check_schema_version` bug (`versions == {1}` vs string `"1.0"`)
      fixed.

A.6 Lint calibration via 4-expert consensus + bloom-canonical
    reclassification:
    - Spawned 4 experts (Vijay Reddi, Chip Huyen, Jeff Dean,
      education-reviewer) on 42 disputed (zone, level) pairs;
      consensus-builder aggregated to 15 valid / 19 invalid / 8
      borderline.
    - User arbitrated 8 borderlines: 7 widen / 1 reclassify.
    - Built ZONE_BLOOM_AFFINITY matrix (Education-Reviewer's idea):
      every zone admits its dominant Bloom verb + adjacent verbs,
      rejects clear hierarchy violations.
    - reclassify_zone_bloom_mismatch.py applied 576 deterministic
      zone fixes via BLOOM_CANONICAL_ZONE mapping (e.g. fluency+analyze
      → analyze, recall+analyze → analyze, evaluation+apply → implement).
    - Question.model_validator(_zone_bloom_compatible): hard-rejects
      future zone-bloom mismatches at write time. Generated drafts
      can no longer ship a self-contradicting classification.
    - ZONE_LEVEL_AFFINITY widened per consensus + arbitration +
      post-reclassification adjustments. Lint warnings: 1,308 → 0.

A.7 Chain integrity:
    - repair_chains.py: drops chain refs when a chain has <2 published
      members (chain ceases to exist), renumbers all members of any
      chain whose positions are non-sequential / duplicated /
      non-monotonic-by-level. Sort key: level ascending, then old
      position, then qid (deterministic).
    - validate-vault.py: relaxed sequential check to unique-positions
      check. Position gaps from mid-chain deletions are normal; what
      matters is uniqueness + bloom-monotonicity (vault check --strict
      enforces both from YAML source-of-truth).

A.8 Practice page visual + zoom modal:
    - QuestionVisual.tsx: wraps the `<img>` in `<Zoom>` from
      react-medium-image-zoom (4 KB). Click image → fullscreen
      `<dialog data-rmiz-modal>`; ESC closes. Added test-id
      `question-visual-img` for stable selector.
    - New Playwright test: 9th in the suite, deep-links cloud-4492,
      asserts the dialog opens on click and closes on ESC.
    - TypeScript: removed `mermaid` from local Visual types in
      corpus.ts and corpus-vault.ts; tsc clean.

A.9 All gates green:
    - vault check --strict: 0 errors / 0 invariant failures
    - vault lint: 0 errors / 0 warnings (was 1,308 warnings)
    - vault codegen --check: artifacts in sync (hash baseline updated)
    - vault doctor: 0 fails (registry-history info, git-state warn
      on uncommitted state-pre-this-commit)
    - staffml validate-vault: 0 errors / 0 warnings, deployment-ready
    - Playwright: 9/9 pass (was 8; +zoom modal test)
    - render_visuals: 0 errors (was 2 silent failures pre-A.2)
    - tsc: clean

Distribution after reclassification: 9,544 published unchanged;
576 items moved zone via bloom-canonical mapping (full per-item
report at /tmp/reclassify_changes.csv). Chain count 879 → 850
after orphan-singleton drops. release_hash updated.

Carry-forward to next session (Phase B):
- Priority gap closure for parallelism cells + global L4-L6+
  (the run that produced this corpus did not close the targeted
  cells; B.3 needs specialized prompts per cell-class)
- 120 NEEDS_FIX items from coverage_loop/20260425_150712/ still
  carry judge fix_suggestions; spawn fix-agent in Phase C
2026-04-25 15:12:51 -04:00
Vijay Janapa Reddi
e2458f311d feat(tools): Playwright smoke-test harness for live-site verification
Introduces tools/release-smoke/ — a headless-Chromium smoke harness for
validating the live state of every mlsysbook.ai sub-site after a publish.

Each site in sites.json declares expected title, optional H1, headings,
additional pages, and a wait-strategy hint (WASM-heavy sites like /labs/
use 'domcontentloaded' since 'networkidle' never settles on long-lived
marimo sockets). For each site the runner:
  - fetches the landing URL
  - asserts HTTP 200, title match, optional H1 match
  - verifies expected headings render
  - collects every same-origin <a href> and HEAD-checks each
  - fetches each declared additional page
  - captures a full-page screenshot
  - captures console errors, page errors, failed requests

Output: reports/smoke-<ISO>.json + screenshots/<site>-<ISO>.png.
Exit code is non-zero iff any site had a hard error (title mismatch,
missing expected heading, missing additional page, uncaught JS error).

Used to verify the initial release train of instructors, slides,
mlsysim, and labs. Cross-site broken links that point to not-yet-
launched sub-sites (/staffml/, /about/, /community/, /vol1/, /vol2/)
are reported but treated as soft warnings.
2026-04-21 18:20:02 -04:00
Vijay Janapa Reddi
a9f86f89e6 chore(tools): preserve math-rendering audit scripts in tools/audit/
Three reusable scanners salvaged from the April 2026 math-rendering audit
(branch audit/math-rendering, now retired):

- audit_math_rendering.py - HTML build + LaTeX-leak scanner
- audit_math_pdf.py       - PDF build + page-image rendering for spot-checks
- audit_pdf_spot_check.py - regex-driven manifest of fix sites in PDFs

These are useful for any future regression check after a Quarto upgrade
or a large prose edit. Output paths (audit-*-report.{md,json},
audit-pdf-output/) are gitignored so the artifacts stay local.
2026-04-21 16:07:06 -04:00
Vijay Janapa Reddi
20594a47d0 feat: Launch StaffML interactive interview platform
- Built a Next.js 14 App Router application in `interviews/staffml` with a premium Vercel/Linear dark-mode aesthetic.
- Developed a robust Python parser (`build_corpus.py`) to convert 1,067 Markdown flashcards into a structured `corpus.json` for the platform.
- Integrated Pyodide WebAssembly to execute the `mlsysim` Python physics engine directly in the browser without a backend.
- Created a Schema Validator (`validate_playbook.py`) to ensure all community contributions maintain structural integrity.
- Upgraded 30+ Visual Debugging scenarios with high-fidelity, theme-aware Mermaid and React Flow diagrams.
- Designed an interactive 'Data Flow' component utilizing React Flow for the Amdahl's Law communication wall.
- Added 'Proof of Work' gamification loops to drive repository stars and user engagement.
2026-03-21 19:06:25 -04:00
Vijay Janapa Reddi
086c2cbac8 refactor: move CI scripts to .github/, remove tools/
- Move sync_newsletter.py to .github/scripts/
- Move merge_contributors.py to .github/workflows/contributors/
- Update workflow YAML paths and script path references
- Delete reorganize_interviews_v2.py (one-off, already run)
- Remove tools/ (mcp_server, sysdesign_platform)
2026-03-21 09:04:53 -04:00
Vijay Janapa Reddi
71388e5df8 tmp removal 2026-03-20 07:35:02 -04:00
Vijay Janapa Reddi
d068a2643c style(svg): remove rounded corners from Vol 2 SVGs for crisp, hardware-focused aesthetic 2026-03-18 16:03:56 -04:00
Vijay Janapa Reddi
c96c29f029 feat(tools): add MCP server with Streamlit app for interview system 2026-03-18 14:56:06 -04:00
Vijay Janapa Reddi
a2000fea41 style(vol2): manually fix overlapping text and overly thick arrows in diagrams 2026-03-16 16:08:31 -04:00
Vijay Janapa Reddi
210f8b173d Adds new inference SVG diagrams and unifies background styles
Enhances the `inference` chapter with several new SVG diagrams, providing visual explanations for complex topics. These figures illustrate:
- Tensor, pipeline, and expert parallelism request routing
- Horizontal scaling with shard groups
- Global load balancing across multiple regions
- Edge caching strategies (hit/miss paths)
- Spot-aware traffic distribution

Updates the `inference.qmd` document to integrate these new diagrams, replacing previous textual and ASCII-art descriptions for improved clarity and presentation.

Applies a widespread style standardization to existing SVG diagrams, uniformly setting the main background fill color to `#fff` (pure white) and a consistent corner radius (`rx="4"`) for the primary canvas rectangle to enhance visual consistency throughout the book.
2026-03-03 19:29:27 -05:00
Vijay Janapa Reddi
96f03a672b fix(build): fix three container build failures across epub, pdf, and html targets
- Remove invalid `output-file` from `project:` block in both EPUB configs
  (Quarto schema only allows `output-file` under `book:`, not `project:`)
- Move `language` to top-level `lang:` and remove HTML-only keys from
  EPUB format blocks (`fig-caption`, `footnotes-hover`, `citations-hover`,
  `code-copy`, `code-line-numbers`, `description`) per Quarto EPUB spec
- Add `matplotlib>=3.7.0` to requirements.txt — was missing from container
  image, causing ModuleNotFoundError during figure rendering
- Add `_matplotlib_available` guard in `viz.setup_plot()` to raise a clear
  ImportError instead of a cryptic AttributeError when matplotlib is absent
2026-03-03 08:14:59 -05:00
Vijay Janapa Reddi
4ae406160d feat: add Quarto equation labels and cross-references across Vol 1
Add proper equation labels ({#eq-...}) and prose references (@eq-...)
to 138 equations across 15 Volume 1 chapters following the gold-standard
pattern from serving.qmd.

Key changes:
- Label all display math equations with {#eq-kebab-case-name}
- Add @eq-name references in prose before each equation
- Equations include: Iron Law, Amdahl's Law, Roofline Model,
  activation functions, backpropagation, attention mechanisms,
  queuing theory, quantization, and system throughput formulas

Also includes:
- PDF formatting improvements (newpage directives for Vol 2)
- LaTeX header updates for chapter styling
- Pre-commit config and validation script updates
2026-02-07 09:40:01 -05:00
Vijay Janapa Reddi
7b92e11193 Repository Restructuring: Prepare for TinyTorch Integration (#1068)
* Restructure: Move book content to book/ subdirectory

- Move quarto/ → book/quarto/
- Move cli/ → book/cli/
- Move docker/ → book/docker/
- Move socratiQ/ → book/socratiQ/
- Move tools/ → book/tools/
- Move scripts/ → book/scripts/
- Move config/ → book/config/
- Move docs/ → book/docs/
- Move binder → book/binder

Git history fully preserved for all moved files.

Part of repository restructuring to support MLSysBook + TinyTorch.

Pre-commit hooks bypassed for this commit as paths need updating.

* Update pre-commit hooks for book/ subdirectory

- Update all quarto/ paths to book/quarto/
- Update all tools/ paths to book/tools/
- Update config/linting to book/config/linting
- Update project structure checks

Pre-commit hooks will now work with new directory structure.

* Update .gitignore for book/ subdirectory structure

- Update quarto/ paths to book/quarto/
- Update assets/ paths to book/quarto/assets/
- Maintain all existing ignore patterns

* Update GitHub workflows for book/ subdirectory

- Update all quarto/ paths to book/quarto/
- Update cli/ paths to book/cli/
- Update tools/ paths to book/tools/
- Update docker/ paths to book/docker/
- Update config/ paths to book/config/
- Maintain all workflow functionality

* Update CLI config to support book/ subdirectory

- Check for book/quarto/ path first
- Fall back to quarto/ for backward compatibility
- Maintain full CLI functionality

* Create new root and book READMEs for dual structure

- Add comprehensive root README explaining both projects
- Create book-specific README with quick start guide
- Document repository structure and navigation
- Prepare for TinyTorch integration
2025-12-05 14:04:21 -08:00
Didier Durand
1b4856507c [Doc] typos in .py and CHANGELOG.md (#1066) 2025-12-04 09:54:17 -08:00
Vijay Janapa Reddi
b62fc03472 chore: remove deprecated build scripts directory
Removes tools/scripts/build/ directory containing:
- README.md
- generate_stats.py
- standardize_sources.sh

These scripts appear to have been deprecated or relocated as part of
repository reorganization. The clean.sh script has been moved to
tools/setup/clean.sh.
2025-12-02 21:54:54 -05:00
Vijay Janapa Reddi
0495d81e3a fix(dev): change dev preview banner timestamp from UTC to EST
Updates the development preview banner to display build timestamps in
Eastern Time (EST/EDT) instead of UTC for easier readability. The
timestamp automatically adjusts for daylight saving time using the
America/New_York timezone. Fallback to UTC if timezone handling fails.
2025-12-02 21:44:20 -05:00
Vijay Janapa Reddi
e55363d316 feat: Add comprehensive EPUB validator with epubcheck integration
- Create validate_epub.py utility for EPUB validation
- Integrates official epubcheck validator when available
- Custom checks for CSS variables and XML comment violations
- Detects common XHTML errors (unclosed tags, unescaped characters)
- Validates EPUB structure (mimetype, container.xml, OPF)
- Supports --quick flag to skip epubcheck for faster validation
- Provides detailed error reporting with file paths and line numbers
2025-11-25 09:49:55 -05:00
Vijay Janapa Reddi
bc51497645 cleanup: remove Claude Code author attributions from scripts
Remove 'Author: Claude Code' lines from script docstrings.
These attributions should not be in the repository per project guidelines.
2025-11-11 13:10:51 -05:00
Vijay Janapa Reddi
570f1e9061 cleanup: remove build artifacts, cache files, and empty catalogs
Remove obsolete files that should not be tracked:
- 3 diagram PDF cache files (auto-generated by Quarto)
- 4 empty footnote_catalog.json files

All removed files are build artifacts or empty placeholders
that provide no ongoing value.
2025-11-11 12:59:12 -05:00
Vijay Janapa Reddi
afa6fdd36f Revert "Merge branch 'feature/alt-text-generation' into dev"
This reverts commit 9e2bfe4e64, reversing
changes made to 0b3f04d82d.
2025-11-10 19:57:42 -05:00
Vijay Janapa Reddi
3be298f3d2 Merge branch 'dev' into feature/alt-text-generation 2025-11-10 19:56:52 -05:00
Vijay Janapa Reddi
0b3f04d82d Removes backup file
Deletes the backup file that contains a list of scripts.
This action streamlines the repository and avoids potential
confusion or conflicts arising from outdated file lists.
2025-11-10 19:56:29 -05:00
Vijay Janapa Reddi
1bb5aac313 refactor(lint): use Python built-in json module for validation
Replaced external check-json hook with custom validator using Python's
built-in json module (json.load). Created validate_json.py wrapper to
handle multiple files.

Benefits:
- No external dependencies
- Uses Python's standard library json parser
- Same validation logic as the build system
- Fast and reliable (0.16s for all JSON files)
2025-11-10 13:23:15 -05:00
kai
ab0926a47a socratiQ folder in root. socratiQ folder in tools dedicated to the build. 2025-11-10 07:08:07 -05:00
Vijay Janapa Reddi
c20c73508b feat(accessibility): Add GenAI-powered alt-text generation tools
- Add generate_alt_text.py script for automated image alt-text generation
- Add README_ALT_TEXT.md with detailed usage instructions
- Add QUICK_START_ALT_TEXT.md for quick reference
- Uses Google Gemini API to generate descriptive alt-text for figures

Related to accessibility improvements for image descriptions.
Work in progress - requires GitHub issue tracking.
2025-11-09 16:53:44 -05:00
Vijay Janapa Reddi
37e40dee36 fix(quizzes): correct MCQ answer explanations and add validation (#1035)
Addresses #1034

Fixed 47 instances across 20 quiz files where MCQ answer explanations
incorrectly referenced the correct option as one of the incorrect options.

Changes:
1. Fixed all quiz JSON files with incorrect option references
   - Fixed patterns like 'Options A, C, and D' when A is correct
   - Fixed patterns like 'Option C is incorrect' when C is correct
   - Fixed patterns like 'Option A describes...' when A is correct

2. Created fix_mcq_answer_explanations.py script
   - Automatically detects and fixes incorrect option references
   - Handles plural and singular patterns
   - Can be run on all quiz files or specific files

3. Enhanced quizzes.py with validation and opt-in redistribution
   - Added validate_mcq_option_references() function
   - Validation runs during quiz generation to catch LLM errors
   - MCQ redistribution now requires --redistribute-mcq flag (opt-in)
   - Prevents bug from being reintroduced during answer shuffling

All 445 MCQ questions validated across 35 quiz files.
2025-11-05 15:58:54 -05:00
kai
1c32b2b0ce Updated bundle.js, removing other js files. Updated search to be just '/' 2025-11-03 23:01:49 -05:00
Vijay Janapa Reddi
2c730dda36 feat(tools): add comprehensive spell checking for TikZ diagrams and prose
Add two complementary spell checking tools for content validation:

- check_tikz_spelling.py: Extracts and validates all visible text from
  TikZ diagrams including node labels, inline annotations, custom pics,
  foreach loops, legends, and comments. Uses pattern-based matching for
  common typos with optional aspell integration.

- check_prose_spelling.py: Intelligently parses QMD structure to check
  only actual prose content while excluding YAML frontmatter, code blocks,
  TikZ diagrams, inline code, math expressions, and URLs. Uses aspell with
  comprehensive ignore list of 500+ technical terms and acronyms.

Both tools provide detailed output with file paths, line numbers, and
context for identified spelling errors. The TikZ checker found and enabled
fixing of typos like 'gatewey', 'poihnts', and 'Intellignet' across the
codebase.
2025-11-03 11:01:04 -05:00
Vijay Janapa Reddi
94dcb6c95d fix(release): remove redundant H1 title from generated release notes
GitHub release UI already displays the title, so including it in the
markdown body creates visual redundancy. Updated generator to start
directly with description paragraph followed by Key Highlights section.

All existing releases (v0.1.0 through v0.4.1) have been updated to
follow this cleaner format.
2025-11-02 11:28:40 -05:00
Vijay Janapa Reddi
a0f9e9caec fix(changelog): simplify gh-pages detection to any commit
PROBLEM:
- Generator was searching for specific commit messages ('Built site for gh-pages')
- Workflow changed message format to '🚀 Deploy release from commit...'
- This caused it to miss recent October publishes and look back to August

SOLUTION:
- ANY commit to gh-pages branch = publication
- Removed message filtering entirely
- Now uses: git log -n 1 origin/gh-pages (simple and reliable)

RESULT:
- Correctly finds Oct 20, 2025 as last publish (was finding Aug 6)
- Tracks 150 commits since last publish (not 1,491)
- Works regardless of commit message format changes
2025-11-02 11:19:40 -05:00
Vijay Janapa Reddi
28fbd01560 fix(precommit): replace Python 3.10+ union syntax with Optional
Replace 'str | None' with 'Optional[str]' in validate_citations.py
for compatibility with Python 3.9 and earlier versions used in
pre-commit environments.
2025-11-02 11:15:33 -05:00
Vijay Janapa Reddi
e56563aba4 feat(release): intelligent release notes generator with no fallback text
Completely rewrites release notes generation to parse and use actual changelog data:

BEFORE:
- Returned hardcoded generic text regardless of changelog content
- Had misleading fallback that ignored real changes
- No categorization or analysis

AFTER:
- Parses changelog sections (frontmatter, chapters, labs, appendix)
- Categorizes changes (content, infrastructure, bug fixes)
- Extracts specific items with chapter names and details
- Generates statistics from actual data (61 updates, 29 chapters, etc)
- Fails explicitly if changelog missing (no misleading fallbacks)
- Validates output quality (must be > 100 chars)

Release notes now accurately reflect what actually changed rather than
returning generic marketing text. Critical for proper release documentation.
2025-11-02 11:08:58 -05:00
Vijay Janapa Reddi
be0694dc2e refactor(maintenance): consolidate release notes scripts into unified tool
Addresses script organization and maintainability:
- Merged generate_release_notes.py and release_notes.py into changelog-releasenotes.py
- Removed deprecated change_log.py (superseded by changelog-releasenotes.py)
- Added diagram-*.pdf to .gitignore (Quarto auto-generated cache files)

This consolidation simplifies the release workflow and eliminates duplicate code.
2025-11-02 11:03:07 -05:00
Vijay Janapa Reddi
7738c8cc03 Fixes answer redistribution in MCQs
Corrects answer redistribution logic in multiple-choice questions to properly update all references to the answer options being swapped, avoiding double-swapping issues.

Addresses an issue where answer text wasn't correctly updated when MCQ answer options were redistributed. It ensures references to option letters (A, B, C, D) are updated in both "The correct answer is X" and "Option X" contexts within the answer text.
2025-11-02 10:19:06 -05:00
Vijay Janapa Reddi
53b31cb8b6 chore(test): add TikZ style linter for undefined style detection\n\nScans all Quarto .tikz blocks, collects defined styles, flags uses of undefined custom styles like Line/Box/etc. Heuristics focus on tokens starting uppercase to minimize false positives from colors and built-in keys. Provides file and line context; exits nonzero on findings. 2025-11-01 13:26:05 -04:00
Didier Durand
2249346319 Fixing dangling link in README.md 2025-10-31 07:20:59 +01:00
Vijay Janapa Reddi
5a271b2a3a Adds required dependencies
Adds beautifulsoup4 and requests libraries to the list of
dependencies needed for the genai scripts. These libraries are
required for enhanced functionality in the scripts.
2025-10-25 14:09:37 -04:00
Vijay Janapa Reddi
10efe50b47 Adds self-referential section checker
Implements a script to detect self-referential or circular section
references within Quarto files. This helps identify potential writing
issues where a section refers to itself, its parent, or its child.
2025-10-17 10:22:07 -04:00
Vijay Janapa Reddi
fd87823d4e enhance: improve changelog generation to focus on user-facing changes
- Enhanced AI prompt to filter out internal infrastructure changes
- Focus on educational improvements that benefit readers and instructors
- Skip entries with only section IDs, formatting, or build system changes
- Prioritize content additions, learning enhancements, and clarity improvements
- Updated changelog with user-focused descriptions since August 6th

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-09 18:18:16 -04:00
Vijay Janapa Reddi
914e4bfb4d fix(citations): add missing bibliography entries and improve validation
Add missing citations to chapter bib files:
- carlini2021extracting to privacy_security.bib
- koomey2011web to frontiers.bib
- quinonero2009dataset to robust_ai.bib

Enhance citation validation script:
- Strip trailing punctuation (.,;:) from citation keys
- Filter out DOI-style citations (e.g., @10.1109/...)
- Prevent false positives from citations like [@key.]

These changes fix all reported citation validation failures while
improving the validation script to handle edge cases better.
2025-10-09 15:01:07 -04:00
Vijay Janapa Reddi
a1498f37cd docs(scripts): add citation validation documentation
Add comprehensive documentation for the new citation validation script
and pre-commit hook, including usage examples, troubleshooting, and
integration details.
2025-10-09 14:48:30 -04:00
Vijay Janapa Reddi
4fe88bf456 feat(pre-commit): add citation validation hook
Add new pre-commit hook to validate that all @key citations in .qmd
files have corresponding entries in their .bib files. This catches
missing bibliography entries before they cause Quarto build failures.

Features:
- Validates citations against bibliography files
- Filters out cross-reference labels (fig-, tbl-, sec-, etc.)
- Provides clear error messages with missing citation keys
- Only checks files being committed (not entire codebase)
- Runs in quiet mode to reduce noise

New script: tools/scripts/content/validate_citations.py
Updated: .pre-commit-config.yaml with validate-citations hook
2025-10-09 14:47:35 -04:00
Vijay Janapa Reddi
26d3ba57bc fix(scripts): correct workspace root path calculation in format_tables.py
Fix path traversal from 3 to 4 parent directories to correctly locate
workspace root when script is at tools/scripts/content/format_tables.py.

This fixes the pre-commit hook error where it was looking for files at
/tools/quarto/contents instead of /quarto/contents.
2025-10-09 13:51:41 -04:00
Vijay Janapa Reddi
3b37726b27 refactor(tools): reorganize scripts directory structure for better maintainability
Consolidated 21 root-level scripts into logical subdirectories:

New structure:
- images/: All image management scripts (10 files consolidated from 3 locations)
- infrastructure/: CI/CD and container scripts (3 files)
- content/: Added formatting scripts (3 files moved from root)
- testing/: All test scripts (5 files consolidated)
- glossary/: Added standardize_glossaries.py
- maintenance/: Added generate_release_notes.py, preflight.py
- utilities/: Added validation scripts

Benefits:
- Reduced root-level clutter (21 → 2 files)
- Related scripts grouped logically
- Easier to find and maintain scripts
- Follows standard project organization patterns

Changes:
- Created new subdirectories: images/, infrastructure/
- Moved scripts from root to appropriate subdirectories
- Consolidated scattered scripts (images were in 3 places)
- Updated all pre-commit hook references
- Created README files for new directories
- Included backup file for rollback if needed

Tool: tools/scripts/reorganize_scripts.py (for future reference)
2025-10-09 13:36:16 -04:00
Vijay Janapa Reddi
2e3930b5d1 feat(tools): add markdown list formatting checker and auto-fixer
Created check_list_formatting.py to enforce proper markdown list formatting:
- Detects bullet lists without preceding blank lines
- Auto-fixes issues with --fix flag
- Supports --check mode for CI/CD validation
- Can process single files or directories recursively
- Comprehensive documentation in README_LIST_FORMATTING.md

This tool ensures markdown renders correctly across all parsers
(Quarto, GitHub, etc.) by requiring empty lines before bullet lists.

Tool location: tools/scripts/utilities/check_list_formatting.py
2025-10-09 13:35:56 -04:00