Commit Graph

10743 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
e83d181ce8 docs: add dev branch banner, volume READMEs, and working-in-the-open section
Update main README with dev branch note, branch guide diagram, and
working-in-the-open section. Add dedicated READMEs for Volume I and
Volume II under book/quarto/contents/. Simplify Volume II status
messaging to a clean work-in-progress banner.
2026-03-08 14:22:19 -04:00
Vijay Janapa Reddi
42bd4e2a9c chore: fix formatting of inline math expressions 2026-03-08 10:23:05 -04:00
Vijay Janapa Reddi
02b46fec00 chore: commit resolved merge conflicts and unstaged changes 2026-03-08 10:17:02 -04:00
Vijay Janapa Reddi
fa89fdd11c newsletter 2026-03-08 09:56:58 -04:00
Vijay Janapa Reddi
1c6465fc36 chore(frameworks): update TikZ figures from PR 1219 2026-03-08 09:48:18 -04:00
Vijay Janapa Reddi
cb5034df98 CARD plots 2026-03-08 09:39:47 -04:00
Vijay Janapa Reddi
395e8fd211 docs(mlsysim): standardize layout, fix landing page visuals, and enable Registry-based sorting 2026-03-07 18:39:42 -05:00
Vijay Janapa Reddi
90e4b849b8 feat(mlsysim): implement Registry pattern for coherent data sorting and storage 2026-03-07 18:39:42 -05:00
Vijay Janapa Reddi
aed43c5b81 docs: clean up landing page and centralize math foundations
- Elevate 5-Layer Progressive Lowering mental model to architecture.qmd

- Clean up landing page copy to be a punchy one-liner

- Re-render architecture composition diagram as SVG for reliability

- Move math derivations out of tutorials and into math.qmd with citations

- Add DGX Spark to Silicon Zoo
2026-03-07 18:37:06 -05:00
Vijay Janapa Reddi
2409f6c20b chore: remove old book/quarto/mlsys shim directory and migrate tools/tests 2026-03-07 17:25:29 -05:00
Vijay Janapa Reddi
aa0c690a6f feat: add newsletter system with Buttondown integration and CLI commands
Adds newsletter infrastructure: CLI commands (new, list, preview, publish,
fetch, status) integrated into binder, Quarto archive site config for
mlsysbook.ai/newsletter/, and 12-month editorial content plan. Drafts
are gitignored for private local writing; sent newsletters are committed
as the public archive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 17:22:52 -05:00
Vijay Janapa Reddi
0b021d407f Updated equations 2026-03-07 17:19:57 -05:00
Vijay Janapa Reddi
c24ab1ccb9 Migrates diagrams to SVG and adds dynamic visualizations
Refactors numerous embedded TikZ diagrams in Quarto markdown files to external SVG images. This improves rendering performance, streamlines content management, and enhances cross-platform consistency.

Introduces interactive "Napkin Math" `callout-notebook` blocks, featuring Python code to generate dynamic visualizations for key system trade-offs and scenarios. Expands the `mlsysim` library with new constants and plotting utilities to support these interactive calculations and comparisons.
2026-03-07 16:15:40 -05:00
Vijay Janapa Reddi
a78f1bd8b0 feat(mlsysim): add documentation site, typed registries, and 6-solver core
Complete MLSYSIM v0.1.0 implementation with:

- Documentation website (Quarto): landing page with animated hero
  and capability carousel, 4 tutorials (hello world, LLM serving,
  distributed training, sustainability), hardware/model/fleet/infra
  catalogs, solver guide, whitepaper, math foundations, glossary,
  and full quartodoc API reference
- Typed registry system: Hardware (18 devices across 5 tiers),
  Models (15 workloads), Systems (fleets, clusters, fabrics),
  Infrastructure (grid profiles, rack configs, datacenters)
- Core types: Pint-backed Quantity, Metadata provenance tracking,
  custom exception hierarchy (OOMError, SLAViolation)
- SimulationConfig with YAML/JSON loading and pre-validation
- Scenario system tying workloads to systems with SLA constraints
- Multi-level evaluation scorecard (feasibility, performance, macro)
- Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 15:59:51 -05:00
Vijay Janapa Reddi
3a6e5c5ef6 docs(mlsysim): ground all analytical solvers in peer-reviewed literature
Added formal citations to:
- SingleNodeSolver (Roofline Model, Williams 2009)
- DistributedSolver (3D Parallelism, Shoeybi 2019; PipePipe, Narayanan 2019)
- ServingSolver (LLM Scaling, Pope 2023)
- ReliabilitySolver (Young-Daly 1974/2006)
- Sustainability/Economics (Patterson 2021; Barroso 2018)
- Core Formulas (Amdahl 1967; Patarasuk 2009)
2026-03-07 15:31:59 -05:00
Vijay Janapa Reddi
f213260153 feat(mlsysim): align analytical solvers with industry-standard literature
Updated solvers to use literature-grade models for:
- Roofline Performance (Williams et al. 2009)
- Transformer Scaling (6PD rule, Kaplan et al. 2020)
- Training Memory (Shoeybi et al. 2019)
- Pipeline Parallelism (Huang et al. 2019)
- LLM Serving (Pope et al. 2023)
- Reliability (Young-Daly 1974/2006)

Introduced Hierarchical Communication Modeling and MFU/HFU metrics.
Fixed test suite imports and return key mismatches.
Updated Smart Doorbell scorecard reference in ml_systems.qmd.
Restored core __init__.py exports for backward compatibility.
2026-03-07 15:02:26 -05:00
Vijay Janapa Reddi
a2038a0121 feat: deploy Quarto landing site as dev preview homepage
Replace the static dev-landing/index.html with the full Quarto-rendered
landing site (book covers, TinyTorch terminal, Hardware Kits, Labs hex
animation, neural-bg). The preview workflow now installs Quarto, renders
landing_site/index.qmd, rewrites mlsysbook.ai URLs to dev-site relative
paths, and deploys the output to the site root.
2026-03-07 07:39:43 -05:00
Vijay Janapa Reddi
3c6b274ce8 feat: add two-volume dev landing pages and fix preview workflow checkout
- Update dev landing to link to /book/ volume chooser instead of old single-volume path
- Add book-index.html as volume picker page deployed at /book/ (Vol I + Vol II cards)
- Fix preview workflow to checkout from dev branch (ref: dev) so landing pages always come from dev
- Add book-index.html to sparse checkout and deploy step
2026-03-06 20:43:27 -05:00
Vijay Janapa Reddi
45db067464 feat: update publish-live workflow for two-volume release assets
Replace single-file PDF/EPUB upload with per-volume release assets:
- Collect Vol1 + Vol2 PDFs and EPUBs into release-assets/ directory
- Upload all assets via loop with automatic content-type detection
- Update download URLs and summary to reference volume-specific files
- Remove hardcoded Machine-Learning-Systems.pdf/epub references
2026-03-06 18:14:45 -05:00
Vijay Janapa Reddi
9f11116c4b fix: use find-based file discovery in compress steps and preview deploy
Replace hardcoded Machine-Learning-Systems.pdf/.epub filenames with
find-based discovery in all 4 compress steps (Linux/Windows PDF/EPUB)
and the dev preview deploy workflow. This ensures correct packaging
regardless of the output filename set in Quarto YAML configs.
2026-03-06 18:11:11 -05:00
Vijay Janapa Reddi
9b81b58cef feat: add volume-specific output filenames to Quarto configs
Set output-file to Machine-Learning-Systems-Vol1 and
Machine-Learning-Systems-Vol2 in all PDF and EPUB configs.
This makes output filenames clean and distinguishable,
serving as single source of truth for downstream packaging.
2026-03-06 18:10:47 -05:00
Vijay Janapa Reddi
059291f243 fix: resolve Windows Unicode encoding errors in PDF builds
- Add PYTHONUTF8=1 env var to all Windows Docker run commands (PEP 540)
- Fix generate_figure_list.py to explicitly use encoding='utf-8' in
  write_text() instead of relying on system default (cp1252 on Windows)
- The ≈ character (\u2248) in Vol I content triggered charmap codec errors
build-verified-windows-v1
2026-03-06 17:18:37 -05:00
Vijay Janapa Reddi
c492de47a4 fix: set UTF-8 encoding in Windows Docker containers
Add PYTHONIOENCODING=utf-8 env var to all Windows Docker run commands,
and set console/output encoding to UTF-8 in the build script. Fixes
UnicodeEncodeError for characters like ≈ (\u2248) that the default
Windows code page (cp1252) cannot encode.
2026-03-06 16:09:42 -05:00
Vijay Janapa Reddi
e3c647e1c3 fix: convert preflight here-string from expandable to literal for file-based execution
The preflight script used @"..."@ (expandable here-string) with backtick-escaped
variables, which produced invalid PowerShell when written to a .ps1 file. Switch
to @'...'@ (literal here-string) with normal PowerShell syntax. Format-specific
checks (PDF/EPUB) are appended conditionally using GitHub Actions expressions.
2026-03-06 15:16:16 -05:00
Vijay Janapa Reddi
4a58fce6ea fix: rewrite all Windows Docker steps to use file-based script execution
Replace pipe-based PowerShell execution (`$script | docker run ... -Command -`)
with file-based execution (`docker run ... -File script.ps1`) for all Windows
Docker steps: preflight, build, PDF compress, and EPUB compress.

The pipe pattern swallows container stdout and doesn't propagate exit codes,
causing builds to appear successful without actually rendering content.
2026-03-06 14:47:55 -05:00
Vijay Janapa Reddi
99925bed34 docs(mlsysim): add initial architecture and development plan 2026-03-06 12:42:58 -05:00
Vijay Janapa Reddi
7183fb6087 fix: use full path for pwsh in Windows Docker containers
The Windows container image installs PowerShell 7 at a fixed path
but the short name 'pwsh' is not resolving in docker run commands.
Use the full path 'C:\Program Files\PowerShell\7\pwsh.exe' instead.
2026-03-06 12:41:34 -05:00
Vijay Janapa Reddi
74000e6077 ci: restore Windows build infrastructure
Restore Windows container build support that was accidentally removed
in commits a76aab467..a90c8803f. This restores:

- Windows Docker infrastructure (book/docker/windows/)
- Windows container build workflow (infra-container-windows.yml)
- Windows matrix entries in book-build-container.yml
- Windows health check support in infra-health-check.yml
- Windows build flags in book-validate-dev.yml and book-publish-live.yml

Restored from pre-removal state at f85e319d6.
2026-03-06 12:13:02 -05:00
Vijay Janapa Reddi
5fb6d6d354 fix: remove stray closing braces in infra-health-check.yml
Remove leftover closing braces that were causing GitHub Actions
workflow parse errors.
2026-03-06 12:00:13 -05:00
Vijay Janapa Reddi
3234a4fbc0 fix: remove trailing OR operators in infra-health-check.yml 2026-03-06 10:14:56 -05:00
Vijay Janapa Reddi
a90c8803ff fix: remove remaining windows outputs from book-build-container.yml 2026-03-06 10:14:00 -05:00
Vijay Janapa Reddi
702a97d00a fix: remove reference to deleted inputs.build_windows 2026-03-06 10:13:12 -05:00
Vijay Janapa Reddi
d02bd6ddd7 fix: remove duplicate shell key in infra-health-check.yml 2026-03-06 10:11:44 -05:00
Vijay Janapa Reddi
a76aab4676 ci: completely remove Windows build infrastructure
Per request, removed all traces of Windows container builds from the project.
This simplifies the CI pipeline to be Linux-only.

- Deleted `book/docker/windows/` directory and its Dockerfile
- Deleted `.github/workflows/infra-container-windows.yml`
- Removed Windows matrix jobs and steps from `book-build-container.yml`
- Removed Windows inputs and outputs from `book-build-container.yml`
- Removed Windows health checks from `infra-health-check.yml`
- Removed Windows references from `book-publish-live.yml`
- Removed Windows references from `book-validate-dev.yml`
2026-03-06 10:04:48 -05:00
github-actions[bot]
78b88fa9ce Update contributors list [skip ci] 2026-03-06 15:04:20 +00:00
Vijay Janapa Reddi
2ae9eb4a40 ci: deprecate Windows container build in dev validation
Removed Windows from the `book-validate-dev.yml` workflow. The Windows
container build is no longer supported or required for the dev validation
phase.

- Removed `build_os` input option for Windows
- Disabled Windows health check
- Removed Windows build matrix configuration
- Removed Windows success checks and reporting
2026-03-06 09:59:45 -05:00
Vijay Janapa Reddi
f85e319d64 fix: add retry logic to GitHub API requests in update_contributors.py
The update_contributors script was failing with intermittent 502 Bad Gateway
errors from the GitHub API when fetching commit history. Added retry logic
with exponential backoff (up to 3 retries) to handle transient 502, 503,
and 504 server errors gracefully.
2026-03-06 09:30:42 -05:00
Vijay Janapa Reddi
b77c8cc8ab fix: resolve PowerShell ScriptBlock parsing error in Windows docker run commands
Refactored Windows container build steps to pipe PowerShell scripts via stdin
instead of passing them as command-line arguments to `docker run`. This prevents
PowerShell from incorrectly interpreting curly braces inside the script string
as a ScriptBlock argument.

Applied this fix to:
- Preflight toolchain (Windows)
- Build format (Windows)
- Compress PDF (Windows)
- Compress EPUB (Windows)
2026-03-06 09:15:01 -05:00
github-actions[bot]
974df145d1 Update contributors list [skip ci] 2026-03-06 13:16:53 +00:00
Vijay Janapa Reddi
1d3bcddd0d Harden remaining Windows PowerShell interpolations.
Replace subexpression-based OS/architecture logging in dockerized Windows compression steps with format strings to avoid escaping-related parse failures.
2026-03-06 08:06:16 -05:00
Vijay Janapa Reddi
60cffb80b7 Fix Windows preflight command-source logging escapes.
Replace embedded subexpression interpolation in the dockerized PowerShell preflight checks with format strings so command source logging does not break script parsing.
2026-03-06 08:04:36 -05:00
Vijay Janapa Reddi
2ce2919be8 Ensure Windows runner starts Docker daemon before pulls.
Add a pre-pull Windows step that starts the docker service and waits until docker version succeeds so container image pulls do not fail on missing docker_engine pipe.
2026-03-06 07:52:23 -05:00
Vijay Janapa Reddi
3f88e2a89b Fix container preflight checks across Linux and Windows.
Make pandoc preflight succeed when only Quarto-bundled pandoc is available, and harden Windows error logging to avoid PowerShell interpolation failures in catch blocks.
2026-03-06 07:38:00 -05:00
Vijay Janapa Reddi
4494e30e71 Add ~88 SVG figures across all 16 Vol 2 chapters with prose integration
Visual audit identified ~85 high-priority figure gaps across Vol 2.
Three categories of work completed:

- Activated ~22 shelved SVGs (renamed from _prefix, wired into QMD)
- Created ~66 new SVGs from scratch following svg-style.md standards
- Replaced ~15 TikZ-in-callout blocks with proper @fig- labeled SVGs

All figures use sandwich prose pattern: intro sentence before the
figure div (telling students what to look for) and takeaway sentence
after (stating the key insight). 47 figures received prose fixes to
ensure complete integration.

SVG standards: viewBox 0 0 680 460, semantic color palette,
Helvetica Neue typography, arrow markers in defs, ≤250-char alt text.
2026-03-05 19:03:22 -05:00
Vijay Janapa Reddi
5885f12344 Fix double .pdf.pdf extension in PDF output filenames
Drop explicit .pdf from output-file in both vol1 and vol2 PDF configs.
Quarto appends the format extension automatically, so including it
produced Machine-Learning-Systems.pdf.pdf.
2026-03-05 18:39:40 -05:00
Vijay Janapa Reddi
f725294b52 Deduplicate Vol 2 principles: single source of truth in part files
Remove 14 duplicate .callout-principle blocks from chapter files,
replacing each with prose containing \ref{nte-...} back-references
to the canonical declaration in the part-level principles files.
This mirrors Vol 1's "declare once, reference everywhere" pattern.

Also reclassifies 3 orphan chapter-level observations from
.callout-principle to .callout-perspective, and adds 7 new
cross-references threading principles across chapters.
2026-03-05 18:07:23 -05:00
Vijay Janapa Reddi
4f44655224 Grant actions read permission to dev validation workflow.
Allow reusable container workflow status queries from the caller and fix a stylesheet spelling issue required by pre-commit checks.
2026-03-05 17:35:07 -05:00
Vijay Janapa Reddi
ce4dc6c483 Ensures blank lines precede Markdown lists
Adds a utility script to enforce proper Markdown list rendering. This addresses an issue where Quarto/Pandoc might incorrectly parse lists as paragraph continuations if not preceded by a blank line. Applies this formatting fix across all Quarto files by inserting the necessary blank lines.
2026-03-05 16:26:31 -05:00
Vijay Janapa Reddi
72d714ead8 Improve container preflight diagnostics and docs references.
Add explicit per-check preflight logging and matrix failure instance reporting in the container build workflow, and update stale documentation links and workflow/file path references.
2026-03-05 16:03:00 -05:00
Vijay Janapa Reddi
aed00cce30 Removes LLM text configuration
Eliminates the `llms-txt` configuration option from the Quarto website settings. This directive is no longer relevant or required for the site's operation, simplifying the overall configuration.
2026-03-05 15:34:53 -05:00