30 Commits

Author SHA1 Message Date
Octopus
8117f39c3b feat: upgrade MiniMax default model to M2.7
- Update example commands to reference MiniMax-M2.7 (latest flagship model)
- MiniMax-M2.7 offers enhanced reasoning and coding capabilities
- Users can also use MiniMax-M2.7-highspeed for low-latency scenarios
2026-03-18 12:15:51 -05:00
octo-patch
97237e79b0 feat: add cloud LLM provider support for caption generation
Add support for OpenAI-compatible cloud APIs (OpenAI, Groq, MiniMax)
as alternatives to local Ollama for caption generation in manage_captions.py.

New CLI flags:
  --provider/-p: select LLM backend (ollama, openai, groq, minimax)
  --api-key: API key for cloud providers (or use env vars)
  --api-base: custom base URL for any OpenAI-compatible endpoint

The existing Ollama workflow is unchanged (remains the default).
Cloud providers use the already-installed openai SDK.
2026-03-15 15:43:03 -05:00
Vijay Janapa Reddi
c69b6ab2d1 Add book tools (agent personas, check_figure_div_syntax) 2026-02-26 15:23:19 -05:00
Vijay Janapa Reddi
e3cc9f7af3 refactor: rename ml_ml_workflow files, consolidate CLI, and clean up scripts
Remove redundant ml_ prefix from ml_workflow chapter files and update all
Quarto config references. Consolidate custom scripts into native binder
subcommands and archive obsolete tooling.
2026-02-13 11:06:28 -05:00
Vijay Janapa Reddi
2390c3ab31 Refactor: consolidate Quarto config layers and content reorganization.
Unifies Quarto metadata into shared base/format/volume fragments while carrying through chapter path, asset, and tooling updates to keep the repository consistent and easier to maintain.
2026-02-12 15:38:55 -05:00
Vijay Janapa Reddi
ff3797a1d8 Refactor: Finalize Volume 1 and update CLI/VSCode tooling
- Completed full Volume 1 refactor to Safe Class Namespace pattern.

- Fixed render errors and verified all 16 chapters.

- Updated 'binder' CLI with native validation and maintenance namespaces.

- Enhanced VS Code extension with Chapter Navigator and Run History.

- Integrated 'binder validate' into pre-commit workflows.
2026-02-11 09:25:50 -05:00
Vijay Janapa Reddi
3dbaa04ebf fix: resolve all pre-commit hook failures across Vol 1 and Vol 2
Content fixes:
- Add references for all 8 appendix_machine tables in surrounding prose
- Remove cross-volume refs (@sec-distributed-training, @sec-security-privacy)
  and replace with self-contained prose
- Fix broken cross-refs (em-dashes, @sec-data-engineering → @sec-data-engineering-ml)
- Fix unreferenced equations (@eq-memory-wall, @eq-training-iron-law)
- Fix nested/forbidden footnotes (hw_acceleration, introduction, dl_primer)
- Fix drop cap incompatibility in conclusion.qmd
- Fix codespell false positive ("trough" added to ignore list)
- Add closer @tbl/@fig references near definitions across all chapters
- Replace inline fmt() calls with pre-computed _str variables (dl_primer)

Checker improvements:
- figure_table_flow_audit.py: exclude code block lines from gap calculation,
  add forward-reference tolerance, broaden code block detection to all fenced
  blocks (tikz, etc.)
- check_render_patterns.py: improve $...$ parsing with shortest-match spans,
  add exponent exception for {python} in ^{...}, exit 0 on warnings-only
2026-02-08 02:01:49 -05:00
Vijay Janapa Reddi
4ae406160d feat: add Quarto equation labels and cross-references across Vol 1
Add proper equation labels ({#eq-...}) and prose references (@eq-...)
to 138 equations across 15 Volume 1 chapters following the gold-standard
pattern from serving.qmd.

Key changes:
- Label all display math equations with {#eq-kebab-case-name}
- Add @eq-name references in prose before each equation
- Equations include: Iron Law, Amdahl's Law, Roofline Model,
  activation functions, backpropagation, attention mechanisms,
  queuing theory, quantization, and system throughput formulas

Also includes:
- PDF formatting improvements (newpage directives for Vol 2)
- LaTeX header updates for chapter styling
- Pre-commit config and validation script updates
2026-02-07 09:40:01 -05:00
Vijay Janapa Reddi
3d54da6305 fix: resolve inline Python build errors across Vol 1 chapters
Fix NameError build failures in ml_systems, data_engineering, and
benchmarking chapters caused by missing imports and variables referenced
before their defining code cells.

- ml_systems: add missing Kparam and Bparam imports from physx.constants
- data_engineering: compute transfer_time_10g_md preview in setup cell,
  add md_math import, add deduplication-dividend-calc cell, convert
  hardcoded values to physics engine units
- benchmarking: compute BERT roofline preview values in roofline-example-calc
  cell before they are referenced in narrative text, convert hardcoded
  values to inline Python, condense redundant footnotes

Also includes physics engine integration improvements across all Vol 1
chapters: unit-safe conversions, inline Python for previously hardcoded
values, streamlined footnotes with cross-references, and new content
validation scripts.

All 21 Vol 1 chapters pass PDF build tests.
2026-02-06 09:57:25 -05:00
Vijay Janapa Reddi
e942b552ba fix: resolve cross-reference issues and add missing table/figure refs
- Update check_unreferenced_labels.py to detect YAML id: frontmatter
- Add references to all unreferenced tables and listings in Vol1
- Scope unreferenced labels hook to Vol1 only (Vol2 has WIP chapters)
- Fix inline Python in LaTeX math blocks across multiple chapters
- Update test_units.py to use Dense (not Sparse) H100 FLOPS values
- Update validate_inline_refs.py regex to ignore escaped dollar signs

Key files fixed:
- appendix_algorithm.qmd: @tbl-tensor-op-ref, @fig-broadcasting-rules
- appendix_data.qmd: @tbl-data-gravity, @tbl-serialization-cost
- appendix_dam.qmd: @tbl-dam-overlap, @tbl-bottleneck-actions, etc.
- appendix_machine.qmd: @tbl-latency-hierarchy, @tbl-hardware-cheatsheet
- frameworks.qmd: @lst-gradient-accumulation, @lst-custom-autograd-function
- dnn_architectures.qmd: @lst-conv_layer_spatial
2026-02-06 06:03:19 -05:00
Vijay Janapa Reddi
a6e0c81380 Update vol1 chapters and add compilation continuum visualizations 2026-02-02 13:28:35 -05:00
Vijay Janapa Reddi
7c0d3e401e Fix index placement issues and add auto-fix script
- Fix \index{} commands breaking rendering when placed before footnote
  definitions, div openers (:::), or on same line as headings
- Add check_index_placement.py script with --fix flag to automatically
  detect and fix these patterns
- Update training.qmd and data_engineering.qmd with corrected index placement
- Include other pending content and visualization updates
2026-02-02 10:39:04 -05:00
Vijay Janapa Reddi
25d965e719 Fix inline Python rendering and add sci() base unit conversion
Key changes:
- sci() and sci_latex() now convert Pint quantities to base units
  (fixes 10^2 showing instead of 10^14 for TFLOPs values)
- Add md_frac(), md_sci(), md_math() helpers for LaTeX in Markdown()
- Update ml_systems.qmd with proper LaTeX fraction rendering
- Add freeze: false to _quarto.yml to prevent caching issues
- Update CLAUDE.md with QMD inline Python conventions
- Fix LATEX_ADJACENT issues across multiple QMD files (Unicode symbols)
2026-02-02 01:18:32 -05:00
Vijay Janapa Reddi
ccd7e5f7a9 Fix table formatter HTML escaping bug and add rendering validator
format_tables.py was escaping <, >, & to HTML entities inside Markdown
grid tables, breaking LaTeX math and comparison operators in rendered
output. Removed the escape_html_entities() calls since Quarto grid
tables are Markdown, not HTML.

New validate_tables.py catches rendering issues the structural formatter
misses: bare pipes in LaTeX math, \frac in multiline cells, HTML
entities, and missing table labels.
2026-02-01 12:41:55 -05:00
Vijay Janapa Reddi
f3680917a7 Add content validation and publishing scripts
New check_references.py for cross-reference validation, preview scripts
for diagram and systems-gap visualization, MIT Press release packaging
script, and improvements to forbidden footnotes checker.
2026-01-31 19:46:42 -05:00
Vijay Janapa Reddi
59442493ba Add figure completeness validation and fix missing fig-cap/fig-alt
Create check_figure_completeness.py pre-commit hook that validates all
figures have captions and alt-text across div, markdown, and code-cell
syntaxes. Add code-cell figure support to extract_figures.py and
figure_table_flow_audit.py. Fix fig-algo-efficiency missing caption in
introduction.qmd and fig-business-cost-curve missing alt-text in ops.qmd.
Vol 1 now passes with 199/199 figures complete.
2026-01-31 19:05:34 -05:00
Vijay Janapa Reddi
5431a5afd6 Add figure/table flow audit script for placement diagnostics
Scans all .qmd chapters and reports where each figure/table is
defined vs. first referenced, flagging placement gaps (LATE,
EARLY, ORPHAN). Found 55 issues across 653 elements in 35
chapters. Enables targeted editorial fixes for figure/table flow.
2026-01-31 16:11:11 -05:00
Vijay Janapa Reddi
fb16e824d5 Organize scripts directory: move active scripts to subdirectories, remove stale files
- Move extract_figures.py to publish/, relocate_figures.py to content/
- Delete redundant extract_figures_vol1.py and stale extract_vol2_headers.py
- Remove orphaned shell scripts (check_keys, clean_build, convert_icons, etc.)
- Remove workflow docs and redundant README_TABLE_FORMATTER.md
- Rewrite README.md to reflect actual directory structure
2026-01-31 10:00:48 -05:00
Vijay Janapa Reddi
7ad6d51f96 Update two-volume textbook content, config, and tooling
- Edit all Vol 1 and Vol 2 chapters for print readiness and pedagogical clarity
- Update Quarto config files for both volumes (PDF, HTML, EPUB)
- Add frontmatter updates (about, acknowledgements, socratiq)
- Remove unused _brand assets (scss, favicon, scripts, manifest)
- Add new utility scripts (audit_figure_placement, format_div_spacing, audit_refs)
- Update format_python_in_qmd script
- Add references.bib entries and seminal papers corpus
2026-01-30 02:42:59 -05:00
Vijay Janapa Reddi
843f536220 Fix table alignment and consolidate callout boxes
- Fix 75 grid tables: ensure first column is always left-aligned
- Update format_tables.py to enforce left-aligned first column rule
- Update convert_pipe_to_grid_tables.py to enforce same rule
- Consolidate redundant callout boxes in ops, model_compression, serving, dl_primer
- Streamline napkin math sections for better flow
2026-01-24 12:24:33 -05:00
Vijay Janapa Reddi
3822c9a880 chore: improve section ID management script
- Add support for different JSON structures (list vs dict) in quiz files
- Implement two-pass approach for repair mode to ensure cross-references
  are updated before headers are changed
- Fix ID comparison to handle leading # in section_id fields
2026-01-24 11:18:17 -05:00
Vijay Janapa Reddi
73bb54f5e4 fix: resolve forbidden footnotes and table formatting issues
- Close unclosed callout divs in model_compression, training, workflow
- Move footnote references outside div blocks (ops, communication)
- Move footnote out of table cell in communication.qmd
- Update check_forbidden_footnotes.py to handle :::: nested divs
- Auto-fix table column spacing in 9 tables across 6 files

All pre-commit checks now pass.
2026-01-24 09:56:14 -05:00
Vijay Janapa Reddi
20c85e00ea refactor(content): migrate TikZ captions to fig-cap attribute
Migrated 226 TikZ figure captions from markdown format to Quarto's fig-cap attribute for consistency with standard Quarto figure conventions.

Before:
  ::: {#fig-id fig-env="figure" fig-pos="htb"}
  ```{.tikz}
  [tikz code]
  ```
  **Caption text**
  :::

After:
  ::: {#fig-id fig-cap="Caption text" fig-env="figure" fig-pos="htb"}
  ```{.tikz}
  [tikz code]
  ```
  :::

Benefits:
- Consistent with standard Quarto image figure syntax
- Caption metadata stays with figure ID attributes
- Cleaner separation of content and metadata
- Easier to parse and maintain programmatically

Changes:
- 27 chapter files updated
- 226 TikZ captions migrated
- Created migration script: book/tools/scripts/content/migrate_tikz_captions.py
2026-01-15 11:43:39 -05:00
Vijay Janapa Reddi
46e8b191bb refactor: convert all pipe-style tables to grid format
Converted 177 simple pipe-style markdown tables to restructuredText
grid-style format across both volumes for consistency with existing
table standards.

Changes:
- Created convert_pipe_to_grid_tables.py script to automate conversion
- Converted tables in 19 chapter files (9 in Vol1, 14 in Vol2)
- Applied format_tables.py to ensure proper formatting:
  * Bold headers across all columns
  * Bold first column for comparison/category tables
  * Intelligent column alignment (left for text, right for numbers)
  * Proper spacing and border alignment

Tables now use consistent grid format with +/- borders and alignment
markers (:= for left, =: for right), matching Volume 1 standards.
2026-01-10 13:10:42 -05:00
Vijay Janapa Reddi
d8b4361154 feat: add section splitter for section-by-section editorial processing
Adds a pypandoc-based section splitter utility that parses .qmd chapter
files and extracts individual sections for processing. This enables
guaranteed 100% coverage in editorial workflows by processing each
section independently rather than entire chapters at once.

Key features:
- Uses pypandoc JSON AST for robust parsing (correctly ignores headers
  inside code blocks, callouts, and TikZ diagrams)
- Falls back to regex-based block tracking if pypandoc unavailable
- Extracts section metadata: title, ID, line numbers, word count
- Supports listing, extraction to files, and JSON manifest output
- Designed for integration with polish workflow agents

Usage:
  python3 section_splitter.py -f chapter.qmd --list
  python3 section_splitter.py -f chapter.qmd --manifest
  python3 section_splitter.py -f chapter.qmd --get-section 3
2026-01-04 17:16:08 -05:00
Vijay Janapa Reddi
9781727d60 refactor: rename advanced_intro to introduction and update scripts
- Renamed vol2/advanced_intro to vol2/introduction for consistency
- Updated all scripts and configs to use vol1/ instead of core/
- Updated pre-commit config to check all contents/ not just vol1/
- Updated path references in Lua filters, Python scripts, and configs
2026-01-01 14:46:52 -05:00
Vijay Janapa Reddi
b44bfb143d fix: correct workspace root path calculation in format_tables.py
The script was using 4 parent directories to calculate workspace_root,
but since the script is at book/tools/scripts/content/format_tables.py,
it needs 5 parents to reach the actual repo root.

This was causing "Directory not found: book/book/quarto/contents" errors
in CI when the pre-commit hook passed paths like "book/quarto/contents/".

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-13 14:13:37 -05:00
Vijay Janapa Reddi
853eb03ee8 style: apply consistent whitespace and formatting across codebase 2025-12-13 14:05:34 -05:00
Vijay Janapa Reddi
ba20e892e7 fix: update hardcoded paths in utility scripts after book/ restructure
Updated Python utility scripts to use correct paths with book/ prefix:
- rename_downloaded_images.py: quarto/contents/labs → book/quarto/contents/labs
- rename_auto_images.py: quarto/contents/labs → book/quarto/contents/labs
- convert_svg_to_png.py: quarto/contents → book/quarto/contents
- check_self_referential_sections.py: quarto/contents → book/quarto/contents

These scripts are run from the repository root, so they need the full
path including the book/ directory.
2025-12-05 15:51:03 -08:00
Vijay Janapa Reddi
7b92e11193 Repository Restructuring: Prepare for TinyTorch Integration (#1068)
* Restructure: Move book content to book/ subdirectory

- Move quarto/ → book/quarto/
- Move cli/ → book/cli/
- Move docker/ → book/docker/
- Move socratiQ/ → book/socratiQ/
- Move tools/ → book/tools/
- Move scripts/ → book/scripts/
- Move config/ → book/config/
- Move docs/ → book/docs/
- Move binder → book/binder

Git history fully preserved for all moved files.

Part of repository restructuring to support MLSysBook + TinyTorch.

Pre-commit hooks bypassed for this commit as paths need updating.

* Update pre-commit hooks for book/ subdirectory

- Update all quarto/ paths to book/quarto/
- Update all tools/ paths to book/tools/
- Update config/linting to book/config/linting
- Update project structure checks

Pre-commit hooks will now work with new directory structure.

* Update .gitignore for book/ subdirectory structure

- Update quarto/ paths to book/quarto/
- Update assets/ paths to book/quarto/assets/
- Maintain all existing ignore patterns

* Update GitHub workflows for book/ subdirectory

- Update all quarto/ paths to book/quarto/
- Update cli/ paths to book/cli/
- Update tools/ paths to book/tools/
- Update docker/ paths to book/docker/
- Update config/ paths to book/config/
- Maintain all workflow functionality

* Update CLI config to support book/ subdirectory

- Check for book/quarto/ path first
- Fall back to quarto/ for backward compatibility
- Maintain full CLI functionality

* Create new root and book READMEs for dual structure

- Add comprehensive root README explaining both projects
- Create book-specific README with quick start guide
- Document repository structure and navigation
- Prepare for TinyTorch integration
2025-12-05 14:04:21 -08:00