cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-06 17:49:07 -05:00

Author	SHA1	Message	Date
github-actions[bot]	fa7ed15edd	docs: add @farhan523 as contributor for code (staffml)	2026-04-28 15:34:58 +00:00
Vijay Janapa Reddi	98f45a5ae4	fix(docs): repair internal links for corpus build artifact and book SCSS - interviews/README: stop linking gitignored vault/corpus.json; note vault build - shared/styles/BRAND: point to style-vol1/2 instead of nonexistent style.scss	2026-04-26 11:27:40 -04:00
github-actions[bot]	4802ab9da6	docs: add @Shashank-Tripathi-07 as contributor for bug, code (book, tinytorch, staffml, labs)	2026-04-25 18:59:16 +00:00
Vijay Janapa Reddi	9f20e7f20d	docs(readmes): add language hints to bare code fences (markdownlint MD040) Add `text` language tag to 25 unlabeled fenced code blocks across the public-facing READMEs. Mostly directory-tree listings, all-contributors bot instructions, and pseudo-output ASCII blocks — none were getting syntax highlighting anyway, but the explicit tag silences markdownlint MD040 and signals intent ("this is plain text, not a forgotten lang").	2026-04-22 16:56:08 -04:00
Vijay Janapa Reddi	434417d69f	docs(readmes): force table width via inline style (override GitHub CSS) GitHub's github-markdown-css applies: .markdown-body table { display: block; width: max-content; max-width: 100%; } The HTML width="100%" attribute is a presentational hint with lower specificity than the class selector, so tables with short cell content were sizing to max-content and not stretching to fill the column. Tables with long sentences per cell stretched fine, masking the bug. Add inline style="width:100%" (specificity 1,0,0,0) which overrides the class-selector rule. Keep width="100%" attribute as a fallback for non-GitHub renderers (VSCode preview, GitLab, plain HTML viewers). 54 tables updated across 10 READMEs + the two contributor-sync scripts that regenerate auto-managed tables.	2026-04-22 16:20:38 -04:00
Vijay Janapa Reddi	eb27858591	docs(readmes): replace HTML card pattern with native GitHub callouts The sub-project READMEs used an old-school nested-table card design with hardcoded bgcolor="#ffffff", "#cfd6dd", "#eef2f7" plus deprecated HTML4 attributes (cellpadding, cellspacing, border). It looked good in light mode but produced harsh white islands in GitHub's dark theme, which is what most readers see today. Across 11 sub-READMEs: - Strip the card wrapper so data tables are just clean <table width="100%"> with semantic <thead>/<tbody>. Headers keep their column widths; bgcolor/valign/zebra-stripe cruft is removed (GitHub provides its own theme-aware striping). - Convert the early-release callouts (and mlperf-edu's two-tier status block + "source of truth" note + interviews' two info boxes) to GitHub-native > [!NOTE] / > [!WARNING] / > [!TIP] callouts. These are theme-aware, get proper icons, and render correctly in light AND dark mode. Net result: 528 lines of HTML cruft removed, 230 lines of clean markdown added. Visual identity is preserved (callouts still stand out, tables still stretch full-width) while becoming dark-mode safe and consistent with the main README.	2026-04-22 16:12:20 -04:00
Vijay Janapa Reddi	b9ee88ca70	docs(readmes): stretch HTML tables to full width Add `width="100%"` to every HTML content and contributor table across all project READMEs so they render full-width on GitHub instead of collapsing to natural content width. Cell-level `width="X%"` percentages were already in place but only take effect once the table itself has an explicit width. Also update the contributor-sync scripts so the auto-generated tables stay consistent on the next bot run: - .github/workflows/contributors/generate_main_readme.py - .github/workflows/contributors/generate_readme_tables.py Scope: 27 files, 85 tables. Sub-project READMEs that already use the "card" pattern (labs/, kits/ content sections with <table width="98%"> wrappers) are intentionally untouched.	2026-04-22 16:01:54 -04:00
Vijay Janapa Reddi	59ecd34f51	docs(readme): standardize wide HTML tables across product READMEs - Add wrap_readme_data_tables.py to frame <table>+<thead>/<tbody> blocks in a 98% width panel (#cfd6dd border, #eef2f7 headers, zebra body rows where applied manually in converted tables). - Apply wraps to book, kits, labs, slides, tinytorch; tbody wraps for kits docs/related and instructors overview. - Convert remaining Markdown tables in mlsysim, mlperf-edu, and interviews to the same HTML pattern; replace StaffML markdown callouts with HTML panels. - Add thead rows to kits/instructors body-only tables for clearer hierarchy.	2026-04-21 08:51:04 -04:00
Vijay Janapa Reddi	d569bfca47	docs(readme): use HTML callouts for 2026 early-release banners Replace markdown blockquotes with a shared centered table pattern (cellpadding, bgcolor panel, h3 + aligned paragraphs) so GitHub renders consistent spacing. Align labs and mlsysim DEV-BANNER with the same layout and 2026 messaging.	2026-04-21 08:26:06 -04:00
Vijay Janapa Reddi	27f4304e0b	docs(readme): add consistent 2026 early-release banners for iterating projects Use a short top-of-README callout for periodic-table, StaffML, TinyTorch, slides, and instructors: live with the 2026 release, expect steady iteration, link to GitHub issues. Slides banner replaces dev-only wording with the same framing while keeping dev/live badges.	2026-04-21 08:24:13 -04:00
Vijay Janapa Reddi	335e134c4a	docs(readmes): standardize sub-project Contributors sections Audit of the eight sub-project READMEs showed inconsistent surrounding text around the auto-managed contributor table — some had a Legend line, some didn't; instructors used a different heading style; interviews was missing the thanks blurb and CTA; mlsysim was missing the END marker and the recognition CTA. Standardize all eight to the same template: heading + thanks blurb + legend + ALL-CONTRIBUTORS markers + Recognize-a-contributor CTA. Per-project quirks preserved: interviews keeps its closing author sign-off paragraph; CTA wording stays project-specific (e.g. labs suggests "code, tutorial, test, or doc" while kits suggests "tool, test, video, or doc"). Per-project sections kept (not consolidated into root). Sub-READMEs are landing pages, sub-projects could be extracted to standalone repos later, and recognition is more meaningful where the work lives. The root README continues to aggregate everything via the existing sectioned tables.	2026-04-19 11:19:20 -04:00
Vijay Janapa Reddi	23e27b2f40	ci(contributors): wire slides + instructors into all-contributors Closes the gap where Slides and Instructor Site were first-class Quarto sites but invisible to the contributor recognition pipeline, and fixes two pre-existing holes for mlsysim/interviews. Workflows - all-contributors-add.yml: add `slides` and `instructors` to PROJECTS; add `slide`/`instructor` aliases. Tighten the LLM prompt with explicit multi-type and emoji/punctuation rules, and add a deterministic regex fallback that scans the trigger comment for type keywords and unions them with the LLM result so a flaky classification never drops a tag. - update-contributors.yml: add `mlsysim`, `interviews`, `slides`, `instructors` to the push trigger paths and to the file/commit lists, so edits to those configs actually rebuild and push READMEs. Generators - generate_main_readme.py: refactor the per-section block to a single PROJECT_SECTIONS table so adding a project is one line; add Slides and Instructor Site sections. - generate_readme_tables.py: register `slides` and `instructors`. Configs / READMEs - New `slides/.all-contributorsrc` and `instructors/.all-contributorsrc` seeded with profvjreddi. - Add ALL-CONTRIBUTORS-LIST markers + recognize-a-contributor blurb to `slides/README.md` and `instructors/README.md`. - Regenerate root, slides, instructors, and (sorted-badge drift) the interviews README via the generators. Docs - Refresh `.github/workflows/contributors/README.md` to list all 8 projects and document the canonical-list-in-PROJECTS contract.	2026-04-19 11:02:37 -04:00
Vijay Janapa Reddi	ff7b4af4b6	docs(readme): update question count 5,700+ \u2192 9,000+ Post-migration the corpus holds 9,199 published questions (the paper's number; site pre-migration reported 8,053 due to the v1 filter-predicate bug). Rounded display count bumped everywhere in the top-level interviews/README.md to reflect the accurate current state.	2026-04-16 16:14:39 -04:00
Vijay Janapa Reddi	c530b709a2	fix(staffml): improve README sample question formatting Restructure sample questions using collapsible <details> with bold summary lines and <blockquote> for question text. Each track now has a clean visual hierarchy: track heading → collapsible questions → indented answers with napkin math.	2026-04-01 09:29:26 -04:00
Vijay Janapa Reddi	fb87570b3f	feat(site): deploy StaffML to /staffml/, promote v3 landing page - Move StaffML app from /interviews/ to /staffml/ (basePath, destination_dir) - Add /interviews/ → /staffml/ redirect for old URL support - Add staffml to site deploy skip list - Upgrade interviews/README.md with star funnel hero + Launch StaffML CTA - Promote v3 rich-card landing page to default index.qmd - Archive original landing as index-v1.qmd - Fix navbar wrapping: icon-only below 1400px, no-wrap on right-side items - Update star CTA copy: "helps others discover" (inclusive wording)	2026-04-01 09:16:44 -04:00
Vijay Janapa Reddi	3ba5345dbf	staffml: enhance README with 15 sample questions, chains, vault stats - 15 hand-picked questions across all tracks and Bloom levels - Each with model answer + napkin math in collapsible details - Depth chains section with example L1→L6+ memory chain - Vault stats table (5,700+ Qs, 1,000+ chains, 650+ concepts) - CTA linking to the full app	2026-03-25 19:52:15 -04:00
Vijay Janapa Reddi	26e0ab3856	restructure interviews/ with vault separation and per-directory licenses - Move corpus, taxonomy, chains, scripts into interviews/vault/ - Rename interviews/staffml/ (was interviews/staffml/) as the branded app - Add CC BY-NC-SA 4.0 LICENSE to: book, kits, labs, slides, instructors, interviews - Add AGPL-3.0 LICENSE to interviews/staffml/ (the app) - Add vault LICENSE for pipeline scripts - Update all GitHub Actions workflows for new paths - Update README links and vault.yaml export paths - Fix regex patterns in site/book deploy workflows License structure: interviews/LICENSE — CC BY-NC-SA 4.0 (corpus + data) interviews/staffml/LICENSE — AGPL-3.0 (app code) interviews/vault/LICENSE — pipeline copyright book\|kits\|labs\|slides\|instructors/LICENSE — CC BY-NC-SA 4.0 tinytorch/LICENSE — Apache 2.0 (unchanged)	2026-03-25 15:18:14 -04:00
Vijay Janapa Reddi	a63a7ac484	staffml: rewrite interviews README, fix all broken links, slate SVG color - Rewrote README to reflect current structure (no more cloud/, edge/ dirs) - Removed 10 broken links to non-existent files - Updated title to "StaffML: ML Systems Interview Playbook" - Added development instructions and CI/CD reference - Curriculum map SVG: slate color for StaffML box (#475569)	2026-03-25 13:11:20 -04:00
Vijay Janapa Reddi	0e09491a85	staffml: add smoke tests to CI, build badge on README - Smoke tests in both dev and live workflows: corpus integrity, required fields, valid levels, taxonomy size, manifest consistency, static assets - Build fails if critical pages missing or data is malformed - README: StaffML build badge, updated question count and platform status	2026-03-25 09:22:34 -04:00
Vijay Janapa Reddi	88e52d52b5	fix(interviews): data hygiene pass — 100% competency mapping, dedup, parser fixes - Expand NORMALIZE_MAP from 265→1,098 entries for 100% topic→area coverage - Fix build_corpus.py: robust L6+ level extraction, strip trailing quotes, populate competency_area field via taxonomy mapping - Remove 12 duplicate question pairs from mobile/tinyml source markdown - Fix corrupted L6+ badge URLs in cloud/02 and mobile/01 - Update all READMEs with accurate post-dedup counts (3,180 questions) - Create AIG pipeline SVG diagram (interviews/images/svg/aig-pipeline.svg) - Sync enriched corpus to StaffML app, verify build passes	2026-03-24 08:49:44 -04:00
Vijay Janapa Reddi	e9aadf39a7	feat(interviews): vault loop scales corpus to 3,192 questions with balance optimization 20-round overnight generation loop filled the 3D coverage cube (track × level × competency) from 1,964 to 3,192 questions. - vault_loop.py: autonomous generation with balance detection (CV metric) - Cloud: 819 questions, Edge: 811, Mobile: 755, TinyML: 768 - Deficit reduced from 119 to 96 cells, imbalance from 66 to 47 - README updated: 1,200+ → 3,200+ - Corpus synced to StaffML app, meta descriptions updated to 3,200+ - GENERATION_PIPELINE.md: documents the AIG methodology	2026-03-24 08:49:44 -04:00
Vijay Janapa Reddi	7142be5e82	feat(interviews): add AIG question generation engine + 486 new questions Build a complete Automatic Item Generation (AIG) pipeline for the StaffML interview question corpus. Grounded in psychometric literature: Bloom's Revised Taxonomy, Evidence-Centered Design, and distractor theory. Engine (interviews/engine/): - schemas.py: Pydantic models with provenance tracking - bloom.py: 6 cognitive levels with verb stems + question templates - generate.py: Gemini Pro via CLI with structured JSON output - validate.py: 7-gate validation (solver, arithmetic, dedup, readability, specificity) - embed.py: ChromaDB + nomic-embed-text-v1.5 for gap analysis + dedup - taxonomy.py: 12 competency areas, 101 canonical tags, weighted target matrix - quality.py: 5-check programmatic quality suite - report.py: Interactive HTML report with UMAP, BERTopic, 3D coverage cube - cli.py: Rich CLI with progress bars, panels, tables - generate.py (root): One-command runner with saturation detection Corpus changes: - 1,116 to 1,602 questions (+486 generated, 44% growth) - L1/L2 coverage: 47 to 315 questions (7x increase) - 100% 2D coverage (track x level), 66% weighted 3D coverage - Cleaned headers: stripped Bloom jargon, added L1/L2 sections - Normalized all tags to canonical 101-tag taxonomy - Fixed hardcoded path in build_corpus.py	2026-03-24 08:49:43 -04:00
Vijay Janapa Reddi	1f6a8d9672	docs(interviews): update README and numbers tracking	2026-03-21 08:29:55 -04:00
Vijay Janapa Reddi	0f33255b59	refactor(interviews): reorganize 1,063 questions by system scope Restructure all 4 tracks from arbitrary round-based files to learner-journey-based scopes. Each file represents the system the student is reasoning about, with competency sub-sections and L3→L6+ mastery levels inside. Cloud: Single Machine → Distributed Systems → Serving Stack → Production Ops Edge: Hardware Platform → Real-Time Pipeline → Deployed System Mobile: Device & SoC → App Experience → Ship & Update TinyML: Microcontroller → Sensing Pipeline → Deployed Device Old round files preserved in _legacy/ folders. All cross-references updated in README, STUDY_GUIDE, TOPIC_MAP, _quarto.yml, and index.qmd.	2026-03-20 10:40:55 -04:00
Vijay Janapa Reddi	49e73bc3ae	docs: update interview playbook count to 1000+	2026-03-18 16:48:07 -04:00
Vijay Janapa Reddi	d8addae4b0	docs: expand interview questions across all tracks and add study guide Add new questions to cloud, edge, mobile, and tinyml tracks. Update NUMBERS, TOPIC_MAP, and README metadata. Add 4-week STUDY_GUIDE.	2026-03-18 14:51:08 -04:00
Vijay Janapa Reddi	62c8d8815e	interview updates	2026-03-17 17:30:13 -04:00
Vijay Janapa Reddi	338fd8f911	docs: correct mathematical inaccuracies in interview flashcards Fix several mathematical calculations in the system design interview flashcards: - Correct pipeline bubble fraction formula to (P-1)/(M+P-1) - Fix Erlang M/M/1 queuing length calculation (L vs Lq) - Fix speculative decoding pass count calculation - Correct TinyML BLE gateway bandwidth math (5 gateways, not 1) - Correct mobile monitoring storage math (1 byte vs 0.1 KB)	2026-03-17 12:52:43 -04:00
Vijay Janapa Reddi	aadaf5b13a	docs: convert all README markdown tables to HTML format Standardize table formatting across 25 README files to use HTML tables with consistent styling (thead/tbody, column widths, bold labels) matching the main README's presentation.	2026-03-17 08:57:21 -04:00
Vijay Janapa Reddi	9d70bfd9be	feat(interviews): restructure playbook into 4 deployment tracks with 130+ questions Reorganize the ML Systems Interview Playbook from a flat structure into a 2D matrix of mastery levels (L3-L6+) and deployment tracks (Cloud, Edge, Mobile, TinyML). Add TOPIC_MAP.md as the master planning document defining 10 universal competency areas and their track-specific manifestations. New content: - cloud/06_Advanced_Systems.md: 14 questions (FlashAttention, KV-cache, FP16/BF16, MoE memory, power/thermal, security/fairness) - edge/02_Edge_Advanced.md: 18 questions (roofline, DRAM budgeting, WCET, thermal throttling, pruning, adversarial patches) - mobile/02_Mobile_Advanced.md: 18 questions (NPU delegation, CoreML FP16 precision bugs, on-device LLM, battery, federated learning) - tinyml/02_TinyML_Advanced.md: 18 questions (MAC budget, tensor arena, quantization, NAS, power harvesting, FOTA, flash extraction) Expert-validated improvements: - Edge thermal throttling corrected to 80°C (per NVIDIA thermal guide) - Mobile precision question updated with real CoreML FP16 bugs (Mish activation errors, BatchNorm epsilon type mismatch from coremltools issues #2359, #2470, #2625) - Cloud KV-cache question extended with Llama-2-70B scaling example - TinyML operator support notes expanded to cover STM32Cube.AI	2026-03-16 19:16:45 -04:00
Vijay Janapa Reddi	3eb47a6368	docs(README): clarify curriculum structure and fix link integrity Add "Why One Repository" section with authorial voice explaining the integrated curriculum design. Replace flat component list with tiered curriculum diagram (SVG) showing how textbook, labs, TinyTorch, MLSys·im, hardware kits, and interview playbook connect. Split tables into "For Students" and "For Educators" sections. Fix links so live components point to mlsysbook.ai and in-development components point to repo READMEs. Add dev banners to component READMEs (kits, mlsysim, labs, interviews, slides). Update branch guide to clarify what is live at mlsysbook.ai vs under development on dev.	2026-03-16 18:57:18 -04:00
Vijay Janapa Reddi	7e022c115a	feat(interviews): overhaul ML Systems Interview Playbook Rebrand from "Interview Hub" to "The ML Systems Interview Playbook" and restructure for depth, navigation, and educational value. - Add "Numbers Every ML Systems Engineer Should Know" reference table grounded in the textbook's constants.py (invariants, scaling rules, hardware snapshot) - Enhance all 36 flashcard questions with Common Mistake, Napkin Math, and Key Equation fields where applicable - Expand Round 5 (Visual Architecture Debugging) from 2 to 7 Mermaid diagram challenges - Strengthen Architect's Rubric from 3 to 6 evaluation axes with scoring guide - Add topic sections within each round, sorted by difficulty - Add inline topic tags and cross-cutting Topic Index in README - Delete duplicate 06_System_Design_Rubric.md - Feature playbook in root README (top nav, Learning Stack, Start Here)	2026-03-16 16:08:31 -04:00
Vijay Janapa Reddi	a2000fea41	style(vol2): manually fix overlapping text and overly thick arrows in diagrams	2026-03-16 16:08:31 -04:00
Vijay Janapa Reddi	dd0c7c2841	docs(interviews): add Round 5 Visual Debugging to Hub index	2026-03-16 08:57:46 -04:00
Vijay Janapa Reddi	d36954b5ac	docs(interviews): remove community LLM prompt to maintain focus on seed data	2026-03-15 18:22:38 -04:00
Vijay Janapa Reddi	f078a6f8dc	docs(interviews): add personal mission note to Hub README	2026-03-15 18:12:51 -04:00
Vijay Janapa Reddi	5550760cfb	refactor(interviews): restructure hub to match Meta/OpenAI 4-round system design loop - Delete old textbook-based buckets. - Introduce 4 industry-aligned 'Rounds': Single-Node Physics, Distributed Infrastructure, Production Serving, and Operations/Economics. - Migrate and adapt seeded questions into the new Round format. - Update Hub README to emphasize 'Systems-First' hiring philosophy.	2026-03-15 18:10:19 -04:00
Vijay Janapa Reddi	c2b90ae030	feat(interviews): systematically seed hub with textbook-derived tiered questions - Implement Level 1 (Screen), Level 2 (Architect), and Level 3 (Lead) taxonomy. - Extract all seed questions directly from Volume I and Volume II chapter math/callouts. - Remove generic 'AI slop' questions and replace with high-signal 'Silicon Realist' physics. - Update Hub README to explain the Funnel of Mastery.	2026-03-15 18:01:41 -04:00
Vijay Janapa Reddi	6d3c9e2ca4	refactor: restore high-value pedagogical structure to Interview Hub - Restore 'Realistic Solution' format for all questions. - Re-integrate 'Whiteboard Challenges' to Hub landing page. - Re-enable 'Deep Dive' links to textbook chapters. - Maintain minimalist 'Hub' branding while preserving study-path depth.	2026-03-15 17:43:34 -04:00
Vijay Janapa Reddi	41799ef4c8	refactor: convert Interview Hub into a minimalist question collection - Remove marketing branding, 'Deep Dive' links, and 'Walls' cheatsheet. - Simplify Interview Hub into professional question/answer categories. - Ensure all content is technical and free of AI-generated commentary.	2026-03-15 17:41:35 -04:00
Vijay Janapa Reddi	98d795e17c	docs: add WIP disclaimer to Interview Hub and improve lab/system metadata - Add 'Work in Progress' note to interviews/README.md. - Standardize lab terminology (Act -> Part) in orientation lab. - Enhance scaling law constants with SystemAssumption metadata in mlsysim.	2026-03-15 17:23:01 -04:00
Vijay Janapa Reddi	a6e3fe95f9	feat: launch AI Systems Interview Hub & LeetCode Practice Arena - Implement 'The Blueprint' Interview Guide and Flashcard Hub. - Add 'AI Systems Arena' for LeetCode-style design challenges. - Integrate 'mlsysim audit' CLI for local hardware profiling. - Setup GitHub Actions for automated contributor recognition and welcome messages. - Add dynamic 'Trending Questions' leaderboard based on community upvotes. - Update root README and main landing page for practitioner focus.	2026-03-15 17:21:39 -04:00

42 Commits