mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-04-30 01:29:07 -05:00
Remove redundant ml_ prefix from ml_workflow chapter files and update all Quarto config references. Consolidate custom scripts into native binder subcommands and archive obsolete tooling.
Glossary Management Scripts
Scripts for managing the ML Systems textbook glossary system.
Quick Commands
Full Rebuild (when chapters change)
cd /Users/VJ/GitHub/MLSysBook
python3 book/tools/scripts/glossary/build_global_glossary.py
python3 book/tools/scripts/glossary/generate_glossary.py
Generate Specific Volume
python3 book/tools/scripts/glossary/generate_glossary.py --volume vol1
python3 book/tools/scripts/glossary/generate_glossary.py --volume vol2
Data Flow
Chapter QMDs → Agent → Individual JSONs → build_global_glossary.py → Volume JSONs → generate_glossary.py → glossary.qmd
Scripts
build_global_glossary.py- Main aggregation script (chapter JSONs → volume JSONs)generate_glossary.py- Page generator (volume JSONs → volume glossary.qmd files)clean_master_glossary.py- Legacy cleanup scriptsmart_consolidation.py- Advanced term consolidationrule_based_consolidation.py- Rule-based term consolidation
Source Files
- Vol1 chapter glossaries:
quarto/contents/vol1/*/<chapter>_glossary.json - Vol2 chapter glossaries:
quarto/contents/vol2/*/<chapter>_glossary.json
Individual chapter glossaries are the source of truth. Edit those, then rebuild.
Output Files
- Volume 1 JSON:
quarto/contents/vol1/backmatter/glossary/vol1_glossary.json - Volume 1 page:
quarto/contents/vol1/backmatter/glossary/glossary.qmd - Volume 2 JSON:
quarto/contents/vol2/backmatter/glossary/vol2_glossary.json - Volume 2 page:
quarto/contents/vol2/backmatter/glossary/glossary.qmd
Each volume has its own self-contained glossary.