mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-05 00:58:56 -05:00
- Create vol1/backmatter/glossary with 462 Vol1-only terms - Create vol2/backmatter/glossary with 250 Vol2-only terms - Remove combined glossary (each volume is now self-contained) - Update build_global_glossary.py to generate per-volume JSONs - Update generate_glossary.py to create per-volume QMD files - Update Quarto sidebar to link to volume-specific glossaries - Remove obsolete data/ folder (glossary data now in backmatter) - Update glossary documentation (README.md, ORGANIZATION.md) Note: Vol2 glossary has some broken refs from pre-existing data issues in edge_intelligence_glossary.json (references ondevice_learning chapter)
49 lines
1.6 KiB
Markdown
49 lines
1.6 KiB
Markdown
# Glossary Management Scripts
|
|
|
|
Scripts for managing the ML Systems textbook glossary system.
|
|
|
|
## Quick Commands
|
|
|
|
### Full Rebuild (when chapters change)
|
|
```bash
|
|
cd /Users/VJ/GitHub/MLSysBook
|
|
python3 book/tools/scripts/glossary/build_global_glossary.py
|
|
python3 book/tools/scripts/glossary/generate_glossary.py
|
|
```
|
|
|
|
### Generate Specific Volume
|
|
```bash
|
|
python3 book/tools/scripts/glossary/generate_glossary.py --volume vol1
|
|
python3 book/tools/scripts/glossary/generate_glossary.py --volume vol2
|
|
```
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
Chapter QMDs → Agent → Individual JSONs → build_global_glossary.py → Volume JSONs → generate_glossary.py → glossary.qmd
|
|
```
|
|
|
|
## Scripts
|
|
|
|
- **`build_global_glossary.py`** - Main aggregation script (chapter JSONs → volume JSONs)
|
|
- **`generate_glossary.py`** - Page generator (volume JSONs → volume glossary.qmd files)
|
|
- **`clean_master_glossary.py`** - Legacy cleanup script
|
|
- **`smart_consolidation.py`** - Advanced term consolidation
|
|
- **`rule_based_consolidation.py`** - Rule-based term consolidation
|
|
|
|
## Source Files
|
|
|
|
- **Vol1 chapter glossaries**: `quarto/contents/vol1/*/<chapter>_glossary.json`
|
|
- **Vol2 chapter glossaries**: `quarto/contents/vol2/*/<chapter>_glossary.json`
|
|
|
|
Individual chapter glossaries are the source of truth. Edit those, then rebuild.
|
|
|
|
## Output Files
|
|
|
|
- **Volume 1 JSON**: `quarto/contents/vol1/backmatter/glossary/vol1_glossary.json`
|
|
- **Volume 1 page**: `quarto/contents/vol1/backmatter/glossary/glossary.qmd`
|
|
- **Volume 2 JSON**: `quarto/contents/vol2/backmatter/glossary/vol2_glossary.json`
|
|
- **Volume 2 page**: `quarto/contents/vol2/backmatter/glossary/glossary.qmd`
|
|
|
|
Each volume has its own self-contained glossary.
|