cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-03-23 23:50:31 -05:00

Author	SHA1	Message	Date
Octopus	8117f39c3b	feat: upgrade MiniMax default model to M2.7 - Update example commands to reference MiniMax-M2.7 (latest flagship model) - MiniMax-M2.7 offers enhanced reasoning and coding capabilities - Users can also use MiniMax-M2.7-highspeed for low-latency scenarios	2026-03-18 12:15:51 -05:00
octo-patch	97237e79b0	feat: add cloud LLM provider support for caption generation Add support for OpenAI-compatible cloud APIs (OpenAI, Groq, MiniMax) as alternatives to local Ollama for caption generation in manage_captions.py. New CLI flags: --provider/-p: select LLM backend (ollama, openai, groq, minimax) --api-key: API key for cloud providers (or use env vars) --api-base: custom base URL for any OpenAI-compatible endpoint The existing Ollama workflow is unchanged (remains the default). Cloud providers use the already-installed openai SDK.	2026-03-15 15:43:03 -05:00
Vijay Janapa Reddi	c69b6ab2d1	Add book tools (agent personas, check_figure_div_syntax)	2026-02-26 15:23:19 -05:00
Vijay Janapa Reddi	e3cc9f7af3	refactor: rename ml_ml_workflow files, consolidate CLI, and clean up scripts Remove redundant ml_ prefix from ml_workflow chapter files and update all Quarto config references. Consolidate custom scripts into native binder subcommands and archive obsolete tooling.	2026-02-13 11:06:28 -05:00
Vijay Janapa Reddi	2390c3ab31	Refactor: consolidate Quarto config layers and content reorganization. Unifies Quarto metadata into shared base/format/volume fragments while carrying through chapter path, asset, and tooling updates to keep the repository consistent and easier to maintain.	2026-02-12 15:38:55 -05:00
Vijay Janapa Reddi	ff3797a1d8	Refactor: Finalize Volume 1 and update CLI/VSCode tooling - Completed full Volume 1 refactor to Safe Class Namespace pattern. - Fixed render errors and verified all 16 chapters. - Updated 'binder' CLI with native validation and maintenance namespaces. - Enhanced VS Code extension with Chapter Navigator and Run History. - Integrated 'binder validate' into pre-commit workflows.	2026-02-11 09:25:50 -05:00
Vijay Janapa Reddi	3dbaa04ebf	fix: resolve all pre-commit hook failures across Vol 1 and Vol 2 Content fixes: - Add references for all 8 appendix_machine tables in surrounding prose - Remove cross-volume refs (@sec-distributed-training, @sec-security-privacy) and replace with self-contained prose - Fix broken cross-refs (em-dashes, @sec-data-engineering → @sec-data-engineering-ml) - Fix unreferenced equations (@eq-memory-wall, @eq-training-iron-law) - Fix nested/forbidden footnotes (hw_acceleration, introduction, dl_primer) - Fix drop cap incompatibility in conclusion.qmd - Fix codespell false positive ("trough" added to ignore list) - Add closer @tbl/@fig references near definitions across all chapters - Replace inline fmt() calls with pre-computed _str variables (dl_primer) Checker improvements: - figure_table_flow_audit.py: exclude code block lines from gap calculation, add forward-reference tolerance, broaden code block detection to all fenced blocks (tikz, etc.) - check_render_patterns.py: improve $...$ parsing with shortest-match spans, add exponent exception for {python} in ^{...}, exit 0 on warnings-only	2026-02-08 02:01:49 -05:00
Vijay Janapa Reddi	4ae406160d	feat: add Quarto equation labels and cross-references across Vol 1 Add proper equation labels ({#eq-...}) and prose references (@eq-...) to 138 equations across 15 Volume 1 chapters following the gold-standard pattern from serving.qmd. Key changes: - Label all display math equations with {#eq-kebab-case-name} - Add @eq-name references in prose before each equation - Equations include: Iron Law, Amdahl's Law, Roofline Model, activation functions, backpropagation, attention mechanisms, queuing theory, quantization, and system throughput formulas Also includes: - PDF formatting improvements (newpage directives for Vol 2) - LaTeX header updates for chapter styling - Pre-commit config and validation script updates	2026-02-07 09:40:01 -05:00
Vijay Janapa Reddi	3d54da6305	fix: resolve inline Python build errors across Vol 1 chapters Fix NameError build failures in ml_systems, data_engineering, and benchmarking chapters caused by missing imports and variables referenced before their defining code cells. - ml_systems: add missing Kparam and Bparam imports from physx.constants - data_engineering: compute transfer_time_10g_md preview in setup cell, add md_math import, add deduplication-dividend-calc cell, convert hardcoded values to physics engine units - benchmarking: compute BERT roofline preview values in roofline-example-calc cell before they are referenced in narrative text, convert hardcoded values to inline Python, condense redundant footnotes Also includes physics engine integration improvements across all Vol 1 chapters: unit-safe conversions, inline Python for previously hardcoded values, streamlined footnotes with cross-references, and new content validation scripts. All 21 Vol 1 chapters pass PDF build tests.	2026-02-06 09:57:25 -05:00
Vijay Janapa Reddi	e942b552ba	fix: resolve cross-reference issues and add missing table/figure refs - Update check_unreferenced_labels.py to detect YAML id: frontmatter - Add references to all unreferenced tables and listings in Vol1 - Scope unreferenced labels hook to Vol1 only (Vol2 has WIP chapters) - Fix inline Python in LaTeX math blocks across multiple chapters - Update test_units.py to use Dense (not Sparse) H100 FLOPS values - Update validate_inline_refs.py regex to ignore escaped dollar signs Key files fixed: - appendix_algorithm.qmd: @tbl-tensor-op-ref, @fig-broadcasting-rules - appendix_data.qmd: @tbl-data-gravity, @tbl-serialization-cost - appendix_dam.qmd: @tbl-dam-overlap, @tbl-bottleneck-actions, etc. - appendix_machine.qmd: @tbl-latency-hierarchy, @tbl-hardware-cheatsheet - frameworks.qmd: @lst-gradient-accumulation, @lst-custom-autograd-function - dnn_architectures.qmd: @lst-conv_layer_spatial	2026-02-06 06:03:19 -05:00
Vijay Janapa Reddi	a6e0c81380	Update vol1 chapters and add compilation continuum visualizations	2026-02-02 13:28:35 -05:00
Vijay Janapa Reddi	7c0d3e401e	Fix index placement issues and add auto-fix script - Fix \index{} commands breaking rendering when placed before footnote definitions, div openers (:::), or on same line as headings - Add check_index_placement.py script with --fix flag to automatically detect and fix these patterns - Update training.qmd and data_engineering.qmd with corrected index placement - Include other pending content and visualization updates	2026-02-02 10:39:04 -05:00
Vijay Janapa Reddi	25d965e719	Fix inline Python rendering and add sci() base unit conversion Key changes: - sci() and sci_latex() now convert Pint quantities to base units (fixes 10^2 showing instead of 10^14 for TFLOPs values) - Add md_frac(), md_sci(), md_math() helpers for LaTeX in Markdown() - Update ml_systems.qmd with proper LaTeX fraction rendering - Add freeze: false to _quarto.yml to prevent caching issues - Update CLAUDE.md with QMD inline Python conventions - Fix LATEX_ADJACENT issues across multiple QMD files (Unicode symbols)	2026-02-02 01:18:32 -05:00
Vijay Janapa Reddi	ccd7e5f7a9	Fix table formatter HTML escaping bug and add rendering validator format_tables.py was escaping <, >, & to HTML entities inside Markdown grid tables, breaking LaTeX math and comparison operators in rendered output. Removed the escape_html_entities() calls since Quarto grid tables are Markdown, not HTML. New validate_tables.py catches rendering issues the structural formatter misses: bare pipes in LaTeX math, \frac in multiline cells, HTML entities, and missing table labels.	2026-02-01 12:41:55 -05:00
Vijay Janapa Reddi	f3680917a7	Add content validation and publishing scripts New check_references.py for cross-reference validation, preview scripts for diagram and systems-gap visualization, MIT Press release packaging script, and improvements to forbidden footnotes checker.	2026-01-31 19:46:42 -05:00
Vijay Janapa Reddi	59442493ba	Add figure completeness validation and fix missing fig-cap/fig-alt Create check_figure_completeness.py pre-commit hook that validates all figures have captions and alt-text across div, markdown, and code-cell syntaxes. Add code-cell figure support to extract_figures.py and figure_table_flow_audit.py. Fix fig-algo-efficiency missing caption in introduction.qmd and fig-business-cost-curve missing alt-text in ops.qmd. Vol 1 now passes with 199/199 figures complete.	2026-01-31 19:05:34 -05:00
Vijay Janapa Reddi	5431a5afd6	Add figure/table flow audit script for placement diagnostics Scans all .qmd chapters and reports where each figure/table is defined vs. first referenced, flagging placement gaps (LATE, EARLY, ORPHAN). Found 55 issues across 653 elements in 35 chapters. Enables targeted editorial fixes for figure/table flow.	2026-01-31 16:11:11 -05:00
Vijay Janapa Reddi	fb16e824d5	Organize scripts directory: move active scripts to subdirectories, remove stale files - Move extract_figures.py to publish/, relocate_figures.py to content/ - Delete redundant extract_figures_vol1.py and stale extract_vol2_headers.py - Remove orphaned shell scripts (check_keys, clean_build, convert_icons, etc.) - Remove workflow docs and redundant README_TABLE_FORMATTER.md - Rewrite README.md to reflect actual directory structure	2026-01-31 10:00:48 -05:00
Vijay Janapa Reddi	7ad6d51f96	Update two-volume textbook content, config, and tooling - Edit all Vol 1 and Vol 2 chapters for print readiness and pedagogical clarity - Update Quarto config files for both volumes (PDF, HTML, EPUB) - Add frontmatter updates (about, acknowledgements, socratiq) - Remove unused _brand assets (scss, favicon, scripts, manifest) - Add new utility scripts (audit_figure_placement, format_div_spacing, audit_refs) - Update format_python_in_qmd script - Add references.bib entries and seminal papers corpus	2026-01-30 02:42:59 -05:00
Vijay Janapa Reddi	843f536220	Fix table alignment and consolidate callout boxes - Fix 75 grid tables: ensure first column is always left-aligned - Update format_tables.py to enforce left-aligned first column rule - Update convert_pipe_to_grid_tables.py to enforce same rule - Consolidate redundant callout boxes in ops, model_compression, serving, dl_primer - Streamline napkin math sections for better flow	2026-01-24 12:24:33 -05:00
Vijay Janapa Reddi	3822c9a880	chore: improve section ID management script - Add support for different JSON structures (list vs dict) in quiz files - Implement two-pass approach for repair mode to ensure cross-references are updated before headers are changed - Fix ID comparison to handle leading # in section_id fields	2026-01-24 11:18:17 -05:00
Vijay Janapa Reddi	73bb54f5e4	fix: resolve forbidden footnotes and table formatting issues - Close unclosed callout divs in model_compression, training, workflow - Move footnote references outside div blocks (ops, communication) - Move footnote out of table cell in communication.qmd - Update check_forbidden_footnotes.py to handle :::: nested divs - Auto-fix table column spacing in 9 tables across 6 files All pre-commit checks now pass.	2026-01-24 09:56:14 -05:00
Vijay Janapa Reddi	20c85e00ea	refactor(content): migrate TikZ captions to fig-cap attribute Migrated 226 TikZ figure captions from markdown format to Quarto's fig-cap attribute for consistency with standard Quarto figure conventions. Before: ::: {#fig-id fig-env="figure" fig-pos="htb"} ```{.tikz} [tikz code] ``` Caption text ::: After: ::: {#fig-id fig-cap="Caption text" fig-env="figure" fig-pos="htb"} ```{.tikz} [tikz code] ``` ::: Benefits: - Consistent with standard Quarto image figure syntax - Caption metadata stays with figure ID attributes - Cleaner separation of content and metadata - Easier to parse and maintain programmatically Changes: - 27 chapter files updated - 226 TikZ captions migrated - Created migration script: book/tools/scripts/content/migrate_tikz_captions.py	2026-01-15 11:43:39 -05:00
Vijay Janapa Reddi	46e8b191bb	refactor: convert all pipe-style tables to grid format Converted 177 simple pipe-style markdown tables to restructuredText grid-style format across both volumes for consistency with existing table standards. Changes: - Created convert_pipe_to_grid_tables.py script to automate conversion - Converted tables in 19 chapter files (9 in Vol1, 14 in Vol2) - Applied format_tables.py to ensure proper formatting: * Bold headers across all columns * Bold first column for comparison/category tables * Intelligent column alignment (left for text, right for numbers) * Proper spacing and border alignment Tables now use consistent grid format with +/- borders and alignment markers (:= for left, =: for right), matching Volume 1 standards.	2026-01-10 13:10:42 -05:00
Vijay Janapa Reddi	d8b4361154	feat: add section splitter for section-by-section editorial processing Adds a pypandoc-based section splitter utility that parses .qmd chapter files and extracts individual sections for processing. This enables guaranteed 100% coverage in editorial workflows by processing each section independently rather than entire chapters at once. Key features: - Uses pypandoc JSON AST for robust parsing (correctly ignores headers inside code blocks, callouts, and TikZ diagrams) - Falls back to regex-based block tracking if pypandoc unavailable - Extracts section metadata: title, ID, line numbers, word count - Supports listing, extraction to files, and JSON manifest output - Designed for integration with polish workflow agents Usage: python3 section_splitter.py -f chapter.qmd --list python3 section_splitter.py -f chapter.qmd --manifest python3 section_splitter.py -f chapter.qmd --get-section 3	2026-01-04 17:16:08 -05:00
Vijay Janapa Reddi	9781727d60	refactor: rename advanced_intro to introduction and update scripts - Renamed vol2/advanced_intro to vol2/introduction for consistency - Updated all scripts and configs to use vol1/ instead of core/ - Updated pre-commit config to check all contents/ not just vol1/ - Updated path references in Lua filters, Python scripts, and configs	2026-01-01 14:46:52 -05:00
Vijay Janapa Reddi	b44bfb143d	fix: correct workspace root path calculation in format_tables.py The script was using 4 parent directories to calculate workspace_root, but since the script is at book/tools/scripts/content/format_tables.py, it needs 5 parents to reach the actual repo root. This was causing "Directory not found: book/book/quarto/contents" errors in CI when the pre-commit hook passed paths like "book/quarto/contents/". 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-13 14:13:37 -05:00
Vijay Janapa Reddi	853eb03ee8	style: apply consistent whitespace and formatting across codebase	2025-12-13 14:05:34 -05:00
Vijay Janapa Reddi	ba20e892e7	fix: update hardcoded paths in utility scripts after book/ restructure Updated Python utility scripts to use correct paths with book/ prefix: - rename_downloaded_images.py: quarto/contents/labs → book/quarto/contents/labs - rename_auto_images.py: quarto/contents/labs → book/quarto/contents/labs - convert_svg_to_png.py: quarto/contents → book/quarto/contents - check_self_referential_sections.py: quarto/contents → book/quarto/contents These scripts are run from the repository root, so they need the full path including the book/ directory.	2025-12-05 15:51:03 -08:00
Vijay Janapa Reddi	7b92e11193	Repository Restructuring: Prepare for TinyTorch Integration (#1068 ) * Restructure: Move book content to book/ subdirectory - Move quarto/ → book/quarto/ - Move cli/ → book/cli/ - Move docker/ → book/docker/ - Move socratiQ/ → book/socratiQ/ - Move tools/ → book/tools/ - Move scripts/ → book/scripts/ - Move config/ → book/config/ - Move docs/ → book/docs/ - Move binder → book/binder Git history fully preserved for all moved files. Part of repository restructuring to support MLSysBook + TinyTorch. Pre-commit hooks bypassed for this commit as paths need updating. * Update pre-commit hooks for book/ subdirectory - Update all quarto/ paths to book/quarto/ - Update all tools/ paths to book/tools/ - Update config/linting to book/config/linting - Update project structure checks Pre-commit hooks will now work with new directory structure. * Update .gitignore for book/ subdirectory structure - Update quarto/ paths to book/quarto/ - Update assets/ paths to book/quarto/assets/ - Maintain all existing ignore patterns * Update GitHub workflows for book/ subdirectory - Update all quarto/ paths to book/quarto/ - Update cli/ paths to book/cli/ - Update tools/ paths to book/tools/ - Update docker/ paths to book/docker/ - Update config/ paths to book/config/ - Maintain all workflow functionality * Update CLI config to support book/ subdirectory - Check for book/quarto/ path first - Fall back to quarto/ for backward compatibility - Maintain full CLI functionality * Create new root and book READMEs for dual structure - Add comprehensive root README explaining both projects - Create book-specific README with quick start guide - Document repository structure and navigation - Prepare for TinyTorch integration	2025-12-05 14:04:21 -08:00

30 Commits