42 Commits

Author SHA1 Message Date
github-actions[bot]
fa7ed15edd docs: add @farhan523 as contributor for code (staffml) 2026-04-28 15:34:58 +00:00
Vijay Janapa Reddi
98f45a5ae4 fix(docs): repair internal links for corpus build artifact and book SCSS
- interviews/README: stop linking gitignored vault/corpus.json; note vault build
- shared/styles/BRAND: point to style-vol1/2 instead of nonexistent style.scss
2026-04-26 11:27:40 -04:00
github-actions[bot]
4802ab9da6 docs: add @Shashank-Tripathi-07 as contributor for bug, code (book, tinytorch, staffml, labs) 2026-04-25 18:59:16 +00:00
Vijay Janapa Reddi
9f20e7f20d docs(readmes): add language hints to bare code fences (markdownlint MD040)
Add `text` language tag to 25 unlabeled fenced code blocks across the
public-facing READMEs. Mostly directory-tree listings, all-contributors
bot instructions, and pseudo-output ASCII blocks — none were getting
syntax highlighting anyway, but the explicit tag silences markdownlint
MD040 and signals intent ("this is plain text, not a forgotten lang").
2026-04-22 16:56:08 -04:00
Vijay Janapa Reddi
434417d69f docs(readmes): force table width via inline style (override GitHub CSS)
GitHub's github-markdown-css applies:
  .markdown-body table { display: block; width: max-content; max-width: 100%; }

The HTML width="100%" attribute is a presentational hint with lower
specificity than the class selector, so tables with short cell content
were sizing to max-content and not stretching to fill the column.
Tables with long sentences per cell stretched fine, masking the bug.

Add inline style="width:100%" (specificity 1,0,0,0) which overrides
the class-selector rule. Keep width="100%" attribute as a fallback for
non-GitHub renderers (VSCode preview, GitLab, plain HTML viewers).

54 tables updated across 10 READMEs + the two contributor-sync scripts
that regenerate auto-managed tables.
2026-04-22 16:20:38 -04:00
Vijay Janapa Reddi
eb27858591 docs(readmes): replace HTML card pattern with native GitHub callouts
The sub-project READMEs used an old-school nested-table card design
with hardcoded bgcolor="#ffffff", "#cfd6dd", "#eef2f7" plus deprecated
HTML4 attributes (cellpadding, cellspacing, border). It looked good in
light mode but produced harsh white islands in GitHub's dark theme,
which is what most readers see today.

Across 11 sub-READMEs:

- Strip the card wrapper so data tables are just clean
  <table width="100%"> with semantic <thead>/<tbody>. Headers keep
  their column widths; bgcolor/valign/zebra-stripe cruft is removed
  (GitHub provides its own theme-aware striping).
- Convert the early-release callouts (and mlperf-edu's two-tier
  status block + "source of truth" note + interviews' two info boxes)
  to GitHub-native > [!NOTE] / > [!WARNING] / > [!TIP] callouts.
  These are theme-aware, get proper icons, and render correctly in
  light AND dark mode.

Net result: 528 lines of HTML cruft removed, 230 lines of clean
markdown added. Visual identity is preserved (callouts still stand
out, tables still stretch full-width) while becoming dark-mode safe
and consistent with the main README.
2026-04-22 16:12:20 -04:00
Vijay Janapa Reddi
b9ee88ca70 docs(readmes): stretch HTML tables to full width
Add `width="100%"` to every HTML content and contributor table across all
project READMEs so they render full-width on GitHub instead of collapsing
to natural content width. Cell-level `width="X%"` percentages were already
in place but only take effect once the table itself has an explicit width.

Also update the contributor-sync scripts so the auto-generated tables stay
consistent on the next bot run:
  - .github/workflows/contributors/generate_main_readme.py
  - .github/workflows/contributors/generate_readme_tables.py

Scope: 27 files, 85 tables. Sub-project READMEs that already use the
"card" pattern (labs/, kits/ content sections with <table width="98%">
wrappers) are intentionally untouched.
2026-04-22 16:01:54 -04:00
Vijay Janapa Reddi
59ecd34f51 docs(readme): standardize wide HTML tables across product READMEs
- Add wrap_readme_data_tables.py to frame <table>+<thead>/<tbody> blocks in a
  98% width panel (#cfd6dd border, #eef2f7 headers, zebra body rows where
  applied manually in converted tables).
- Apply wraps to book, kits, labs, slides, tinytorch; tbody wraps for kits
  docs/related and instructors overview.
- Convert remaining Markdown tables in mlsysim, mlperf-edu, and interviews to
  the same HTML pattern; replace StaffML markdown callouts with HTML panels.
- Add thead rows to kits/instructors body-only tables for clearer hierarchy.
2026-04-21 08:51:04 -04:00
Vijay Janapa Reddi
d569bfca47 docs(readme): use HTML callouts for 2026 early-release banners
Replace markdown blockquotes with a shared centered table pattern
(cellpadding, bgcolor panel, h3 + aligned paragraphs) so GitHub renders
consistent spacing. Align labs and mlsysim DEV-BANNER with the same layout
and 2026 messaging.
2026-04-21 08:26:06 -04:00
Vijay Janapa Reddi
27f4304e0b docs(readme): add consistent 2026 early-release banners for iterating projects
Use a short top-of-README callout for periodic-table, StaffML, TinyTorch,
slides, and instructors: live with the 2026 release, expect steady iteration,
link to GitHub issues. Slides banner replaces dev-only wording with the same
framing while keeping dev/live badges.
2026-04-21 08:24:13 -04:00
Vijay Janapa Reddi
335e134c4a docs(readmes): standardize sub-project Contributors sections
Audit of the eight sub-project READMEs showed inconsistent surrounding
text around the auto-managed contributor table — some had a Legend line,
some didn't; instructors used a different heading style; interviews was
missing the thanks blurb and CTA; mlsysim was missing the END marker
and the recognition CTA.

Standardize all eight to the same template: heading + thanks blurb +
legend + ALL-CONTRIBUTORS markers + Recognize-a-contributor CTA.

Per-project quirks preserved: interviews keeps its closing author
sign-off paragraph; CTA wording stays project-specific (e.g. labs
suggests "code, tutorial, test, or doc" while kits suggests
"tool, test, video, or doc").

Per-project sections kept (not consolidated into root). Sub-READMEs
are landing pages, sub-projects could be extracted to standalone repos
later, and recognition is more meaningful where the work lives. The
root README continues to aggregate everything via the existing
sectioned tables.
2026-04-19 11:19:20 -04:00
Vijay Janapa Reddi
23e27b2f40 ci(contributors): wire slides + instructors into all-contributors
Closes the gap where Slides and Instructor Site were first-class Quarto
sites but invisible to the contributor recognition pipeline, and fixes
two pre-existing holes for mlsysim/interviews.

Workflows
- all-contributors-add.yml: add `slides` and `instructors` to PROJECTS;
  add `slide`/`instructor` aliases. Tighten the LLM prompt with explicit
  multi-type and emoji/punctuation rules, and add a deterministic regex
  fallback that scans the trigger comment for type keywords and unions
  them with the LLM result so a flaky classification never drops a tag.
- update-contributors.yml: add `mlsysim`, `interviews`, `slides`,
  `instructors` to the push trigger paths and to the file/commit lists,
  so edits to those configs actually rebuild and push READMEs.

Generators
- generate_main_readme.py: refactor the per-section block to a single
  PROJECT_SECTIONS table so adding a project is one line; add Slides
  and Instructor Site sections.
- generate_readme_tables.py: register `slides` and `instructors`.

Configs / READMEs
- New `slides/.all-contributorsrc` and `instructors/.all-contributorsrc`
  seeded with profvjreddi.
- Add ALL-CONTRIBUTORS-LIST markers + recognize-a-contributor blurb to
  `slides/README.md` and `instructors/README.md`.
- Regenerate root, slides, instructors, and (sorted-badge drift) the
  interviews README via the generators.

Docs
- Refresh `.github/workflows/contributors/README.md` to list all 8
  projects and document the canonical-list-in-PROJECTS contract.
2026-04-19 11:02:37 -04:00
Vijay Janapa Reddi
ff7b4af4b6 docs(readme): update question count 5,700+ \u2192 9,000+
Post-migration the corpus holds 9,199 published questions (the paper's
number; site pre-migration reported 8,053 due to the v1 filter-predicate
bug). Rounded display count bumped everywhere in the top-level
interviews/README.md to reflect the accurate current state.
2026-04-16 16:14:39 -04:00
Vijay Janapa Reddi
c530b709a2 fix(staffml): improve README sample question formatting
Restructure sample questions using collapsible <details> with bold
summary lines and <blockquote> for question text. Each track now has
a clean visual hierarchy: track heading → collapsible questions →
indented answers with napkin math.
2026-04-01 09:29:26 -04:00
Vijay Janapa Reddi
fb87570b3f feat(site): deploy StaffML to /staffml/, promote v3 landing page
- Move StaffML app from /interviews/ to /staffml/ (basePath, destination_dir)
- Add /interviews/ → /staffml/ redirect for old URL support
- Add staffml to site deploy skip list
- Upgrade interviews/README.md with star funnel hero + Launch StaffML CTA
- Promote v3 rich-card landing page to default index.qmd
- Archive original landing as index-v1.qmd
- Fix navbar wrapping: icon-only below 1400px, no-wrap on right-side items
- Update star CTA copy: "helps others discover" (inclusive wording)
2026-04-01 09:16:44 -04:00
Vijay Janapa Reddi
3ba5345dbf staffml: enhance README with 15 sample questions, chains, vault stats
- 15 hand-picked questions across all tracks and Bloom levels
- Each with model answer + napkin math in collapsible details
- Depth chains section with example L1→L6+ memory chain
- Vault stats table (5,700+ Qs, 1,000+ chains, 650+ concepts)
- CTA linking to the full app
2026-03-25 19:52:15 -04:00
Vijay Janapa Reddi
26e0ab3856 restructure interviews/ with vault separation and per-directory licenses
- Move corpus, taxonomy, chains, scripts into interviews/vault/
- Rename interviews/staffml/ (was interviews/staffml/) as the branded app
- Add CC BY-NC-SA 4.0 LICENSE to: book, kits, labs, slides, instructors, interviews
- Add AGPL-3.0 LICENSE to interviews/staffml/ (the app)
- Add vault LICENSE for pipeline scripts
- Update all GitHub Actions workflows for new paths
- Update README links and vault.yaml export paths
- Fix regex patterns in site/book deploy workflows

License structure:
  interviews/LICENSE      — CC BY-NC-SA 4.0 (corpus + data)
  interviews/staffml/LICENSE — AGPL-3.0 (app code)
  interviews/vault/LICENSE   — pipeline copyright
  book|kits|labs|slides|instructors/LICENSE — CC BY-NC-SA 4.0
  tinytorch/LICENSE       — Apache 2.0 (unchanged)
2026-03-25 15:18:14 -04:00
Vijay Janapa Reddi
a63a7ac484 staffml: rewrite interviews README, fix all broken links, slate SVG color
- Rewrote README to reflect current structure (no more cloud/, edge/ dirs)
- Removed 10 broken links to non-existent files
- Updated title to "StaffML: ML Systems Interview Playbook"
- Added development instructions and CI/CD reference
- Curriculum map SVG: slate color for StaffML box (#475569)
2026-03-25 13:11:20 -04:00
Vijay Janapa Reddi
0e09491a85 staffml: add smoke tests to CI, build badge on README
- Smoke tests in both dev and live workflows: corpus integrity, required
  fields, valid levels, taxonomy size, manifest consistency, static assets
- Build fails if critical pages missing or data is malformed
- README: StaffML build badge, updated question count and platform status
2026-03-25 09:22:34 -04:00
Vijay Janapa Reddi
88e52d52b5 fix(interviews): data hygiene pass — 100% competency mapping, dedup, parser fixes
- Expand NORMALIZE_MAP from 265→1,098 entries for 100% topic→area coverage
- Fix build_corpus.py: robust L6+ level extraction, strip trailing quotes,
  populate competency_area field via taxonomy mapping
- Remove 12 duplicate question pairs from mobile/tinyml source markdown
- Fix corrupted L6+ badge URLs in cloud/02 and mobile/01
- Update all READMEs with accurate post-dedup counts (3,180 questions)
- Create AIG pipeline SVG diagram (interviews/images/svg/aig-pipeline.svg)
- Sync enriched corpus to StaffML app, verify build passes
2026-03-24 08:49:44 -04:00
Vijay Janapa Reddi
e9aadf39a7 feat(interviews): vault loop scales corpus to 3,192 questions with balance optimization
20-round overnight generation loop filled the 3D coverage cube
(track × level × competency) from 1,964 to 3,192 questions.

- vault_loop.py: autonomous generation with balance detection (CV metric)
- Cloud: 819 questions, Edge: 811, Mobile: 755, TinyML: 768
- Deficit reduced from 119 to 96 cells, imbalance from 66 to 47
- README updated: 1,200+ → 3,200+
- Corpus synced to StaffML app, meta descriptions updated to 3,200+
- GENERATION_PIPELINE.md: documents the AIG methodology
2026-03-24 08:49:44 -04:00
Vijay Janapa Reddi
7142be5e82 feat(interviews): add AIG question generation engine + 486 new questions
Build a complete Automatic Item Generation (AIG) pipeline for the StaffML
interview question corpus. Grounded in psychometric literature: Bloom's
Revised Taxonomy, Evidence-Centered Design, and distractor theory.

Engine (interviews/engine/):
- schemas.py: Pydantic models with provenance tracking
- bloom.py: 6 cognitive levels with verb stems + question templates
- generate.py: Gemini Pro via CLI with structured JSON output
- validate.py: 7-gate validation (solver, arithmetic, dedup, readability, specificity)
- embed.py: ChromaDB + nomic-embed-text-v1.5 for gap analysis + dedup
- taxonomy.py: 12 competency areas, 101 canonical tags, weighted target matrix
- quality.py: 5-check programmatic quality suite
- report.py: Interactive HTML report with UMAP, BERTopic, 3D coverage cube
- cli.py: Rich CLI with progress bars, panels, tables
- generate.py (root): One-command runner with saturation detection

Corpus changes:
- 1,116 to 1,602 questions (+486 generated, 44% growth)
- L1/L2 coverage: 47 to 315 questions (7x increase)
- 100% 2D coverage (track x level), 66% weighted 3D coverage
- Cleaned headers: stripped Bloom jargon, added L1/L2 sections
- Normalized all tags to canonical 101-tag taxonomy
- Fixed hardcoded path in build_corpus.py
2026-03-24 08:49:43 -04:00
Vijay Janapa Reddi
1f6a8d9672 docs(interviews): update README and numbers tracking 2026-03-21 08:29:55 -04:00
Vijay Janapa Reddi
0f33255b59 refactor(interviews): reorganize 1,063 questions by system scope
Restructure all 4 tracks from arbitrary round-based files to
learner-journey-based scopes. Each file represents the system
the student is reasoning about, with competency sub-sections
and L3→L6+ mastery levels inside.

Cloud: Single Machine → Distributed Systems → Serving Stack → Production Ops
Edge: Hardware Platform → Real-Time Pipeline → Deployed System
Mobile: Device & SoC → App Experience → Ship & Update
TinyML: Microcontroller → Sensing Pipeline → Deployed Device

Old round files preserved in _legacy/ folders. All cross-references
updated in README, STUDY_GUIDE, TOPIC_MAP, _quarto.yml, and index.qmd.
2026-03-20 10:40:55 -04:00
Vijay Janapa Reddi
49e73bc3ae docs: update interview playbook count to 1000+ 2026-03-18 16:48:07 -04:00
Vijay Janapa Reddi
d8addae4b0 docs: expand interview questions across all tracks and add study guide
Add new questions to cloud, edge, mobile, and tinyml tracks. Update
NUMBERS, TOPIC_MAP, and README metadata. Add 4-week STUDY_GUIDE.
2026-03-18 14:51:08 -04:00
Vijay Janapa Reddi
62c8d8815e interview updates 2026-03-17 17:30:13 -04:00
Vijay Janapa Reddi
338fd8f911 docs: correct mathematical inaccuracies in interview flashcards
Fix several mathematical calculations in the system design interview flashcards:

- Correct pipeline bubble fraction formula to (P-1)/(M+P-1)

- Fix Erlang M/M/1 queuing length calculation (L vs Lq)

- Fix speculative decoding pass count calculation

- Correct TinyML BLE gateway bandwidth math (5 gateways, not 1)

- Correct mobile monitoring storage math (1 byte vs 0.1 KB)
2026-03-17 12:52:43 -04:00
Vijay Janapa Reddi
aadaf5b13a docs: convert all README markdown tables to HTML format
Standardize table formatting across 25 README files to use
HTML tables with consistent styling (thead/tbody, column widths,
bold labels) matching the main README's presentation.
2026-03-17 08:57:21 -04:00
Vijay Janapa Reddi
9d70bfd9be feat(interviews): restructure playbook into 4 deployment tracks with 130+ questions
Reorganize the ML Systems Interview Playbook from a flat structure into
a 2D matrix of mastery levels (L3-L6+) and deployment tracks (Cloud,
Edge, Mobile, TinyML). Add TOPIC_MAP.md as the master planning document
defining 10 universal competency areas and their track-specific
manifestations.

New content:
- cloud/06_Advanced_Systems.md: 14 questions (FlashAttention, KV-cache,
  FP16/BF16, MoE memory, power/thermal, security/fairness)
- edge/02_Edge_Advanced.md: 18 questions (roofline, DRAM budgeting,
  WCET, thermal throttling, pruning, adversarial patches)
- mobile/02_Mobile_Advanced.md: 18 questions (NPU delegation, CoreML
  FP16 precision bugs, on-device LLM, battery, federated learning)
- tinyml/02_TinyML_Advanced.md: 18 questions (MAC budget, tensor arena,
  quantization, NAS, power harvesting, FOTA, flash extraction)

Expert-validated improvements:
- Edge thermal throttling corrected to 80°C (per NVIDIA thermal guide)
- Mobile precision question updated with real CoreML FP16 bugs (Mish
  activation errors, BatchNorm epsilon type mismatch from coremltools
  issues #2359, #2470, #2625)
- Cloud KV-cache question extended with Llama-2-70B scaling example
- TinyML operator support notes expanded to cover STM32Cube.AI
2026-03-16 19:16:45 -04:00
Vijay Janapa Reddi
3eb47a6368 docs(README): clarify curriculum structure and fix link integrity
Add "Why One Repository" section with authorial voice explaining the
integrated curriculum design. Replace flat component list with tiered
curriculum diagram (SVG) showing how textbook, labs, TinyTorch,
MLSys·im, hardware kits, and interview playbook connect. Split tables
into "For Students" and "For Educators" sections. Fix links so live
components point to mlsysbook.ai and in-development components point
to repo READMEs. Add dev banners to component READMEs (kits, mlsysim,
labs, interviews, slides). Update branch guide to clarify what is live
at mlsysbook.ai vs under development on dev.
2026-03-16 18:57:18 -04:00
Vijay Janapa Reddi
7e022c115a feat(interviews): overhaul ML Systems Interview Playbook
Rebrand from "Interview Hub" to "The ML Systems Interview Playbook"
and restructure for depth, navigation, and educational value.

- Add "Numbers Every ML Systems Engineer Should Know" reference table
  grounded in the textbook's constants.py (invariants, scaling rules,
  hardware snapshot)
- Enhance all 36 flashcard questions with Common Mistake, Napkin Math,
  and Key Equation fields where applicable
- Expand Round 5 (Visual Architecture Debugging) from 2 to 7 Mermaid
  diagram challenges
- Strengthen Architect's Rubric from 3 to 6 evaluation axes with
  scoring guide
- Add topic sections within each round, sorted by difficulty
- Add inline topic tags and cross-cutting Topic Index in README
- Delete duplicate 06_System_Design_Rubric.md
- Feature playbook in root README (top nav, Learning Stack, Start Here)
2026-03-16 16:08:31 -04:00
Vijay Janapa Reddi
a2000fea41 style(vol2): manually fix overlapping text and overly thick arrows in diagrams 2026-03-16 16:08:31 -04:00
Vijay Janapa Reddi
dd0c7c2841 docs(interviews): add Round 5 Visual Debugging to Hub index 2026-03-16 08:57:46 -04:00
Vijay Janapa Reddi
d36954b5ac docs(interviews): remove community LLM prompt to maintain focus on seed data 2026-03-15 18:22:38 -04:00
Vijay Janapa Reddi
f078a6f8dc docs(interviews): add personal mission note to Hub README 2026-03-15 18:12:51 -04:00
Vijay Janapa Reddi
5550760cfb refactor(interviews): restructure hub to match Meta/OpenAI 4-round system design loop
- Delete old textbook-based buckets.
- Introduce 4 industry-aligned 'Rounds': Single-Node Physics, Distributed Infrastructure, Production Serving, and Operations/Economics.
- Migrate and adapt seeded questions into the new Round format.
- Update Hub README to emphasize 'Systems-First' hiring philosophy.
2026-03-15 18:10:19 -04:00
Vijay Janapa Reddi
c2b90ae030 feat(interviews): systematically seed hub with textbook-derived tiered questions
- Implement Level 1 (Screen), Level 2 (Architect), and Level 3 (Lead) taxonomy.
- Extract all seed questions directly from Volume I and Volume II chapter math/callouts.
- Remove generic 'AI slop' questions and replace with high-signal 'Silicon Realist' physics.
- Update Hub README to explain the Funnel of Mastery.
2026-03-15 18:01:41 -04:00
Vijay Janapa Reddi
6d3c9e2ca4 refactor: restore high-value pedagogical structure to Interview Hub
- Restore 'Realistic Solution' format for all questions.
- Re-integrate 'Whiteboard Challenges' to Hub landing page.
- Re-enable 'Deep Dive' links to textbook chapters.
- Maintain minimalist 'Hub' branding while preserving study-path depth.
2026-03-15 17:43:34 -04:00
Vijay Janapa Reddi
41799ef4c8 refactor: convert Interview Hub into a minimalist question collection
- Remove marketing branding, 'Deep Dive' links, and 'Walls' cheatsheet.
- Simplify Interview Hub into professional question/answer categories.
- Ensure all content is technical and free of AI-generated commentary.
2026-03-15 17:41:35 -04:00
Vijay Janapa Reddi
98d795e17c docs: add WIP disclaimer to Interview Hub and improve lab/system metadata
- Add 'Work in Progress' note to interviews/README.md.
- Standardize lab terminology (Act -> Part) in orientation lab.
- Enhance scaling law constants with SystemAssumption metadata in mlsysim.
2026-03-15 17:23:01 -04:00
Vijay Janapa Reddi
a6e3fe95f9 feat: launch AI Systems Interview Hub & LeetCode Practice Arena
- Implement 'The Blueprint' Interview Guide and Flashcard Hub.
- Add 'AI Systems Arena' for LeetCode-style design challenges.
- Integrate 'mlsysim audit' CLI for local hardware profiling.
- Setup GitHub Actions for automated contributor recognition and welcome messages.
- Add dynamic 'Trending Questions' leaderboard based on community upvotes.
- Update root README and main landing page for practitioner focus.
2026-03-15 17:21:39 -04:00