Files
cs249r_book/tools/scripts/docs/FIGURE_CAPTIONS.md
Vijay Janapa Reddi 193ae009b4 docs: update documentation for new build structure and configuration
- Update all documentation to reflect new build/ directory structure
- Update configuration file references from _quarto.yml to _quarto-html.yml and _quarto-pdf.yml
- Update output paths from _book/ to build/html/ and build/pdf/
- Update disk usage commands and maintenance procedures
- Update script documentation to reflect new configuration structure
- Mark legacy cache directories appropriately
2025-07-27 10:48:37 -04:00

9.2 KiB

Figure Caption Improvement Script

Overview

This script improves figure and table captions in the ML Systems textbook using local Ollama LLM models. It provides automated caption enhancement with strong, educational language while maintaining proper formatting.

Prerequisites

Software Requirements

# Python dependencies (included in main requirements.txt)
pip install pypandoc pyyaml requests pillow

# Ollama for LLM caption improvement
brew install ollama  # macOS
# or: curl -fsSL https://ollama.ai/install.sh | sh  # Linux

# Download recommended models
ollama pull qwen2.5:7b      # Default model (good balance)
ollama pull gemma2:9b       # High quality alternative
ollama pull llama3.2:3b     # Fast lightweight option

Hardware Requirements

  • 8GB+ RAM for LLM processing
  • SSD storage for faster model loading
  • GPU optional but improves performance

Quick Start

# Process all core chapters with default model
python3 scripts/improve_figure_captions.py -d contents/core/

# Use specific model
python3 scripts/improve_figure_captions.py -d contents/core/ -m gemma2:9b

# Process specific files
python3 scripts/improve_figure_captions.py -f contents/core/introduction/introduction.qmd

Command Line Options

Main Modes

All main options have both short and long forms:

Option Short Purpose
--improve -i LLM caption improvement (default mode)
--build-map -b Build content map and save to JSON
--analyze -a Quality analysis + file validation
--repair -r Fix formatting issues only

Additional Options

Option Short Purpose
--model -m Specify Ollama model (default: qwen2.5:7b)
--files -f Process specific QMD files
--directories -d Process directories (follows _quarto-html.yml order)
--save-json Save detailed content map to JSON
--list-models List available Ollama models

Usage Examples

Complete Caption Improvement

# Default workflow - improve all captions
python3 scripts/improve_figure_captions.py -d contents/core/

# Equivalent explicit command
python3 scripts/improve_figure_captions.py --improve -d contents/core/

# With different model
python3 scripts/improve_figure_captions.py -i -d contents/core/ -m gemma2:9b

# Multiple directories
python3 scripts/improve_figure_captions.py -d contents/core/ -d contents/frontmatter/

Analysis and Utilities

# Build content map only
python3 scripts/improve_figure_captions.py --build-map -d contents/core/
python3 scripts/improve_figure_captions.py -b -d contents/core/

# Analyze caption quality and validate structure
python3 scripts/improve_figure_captions.py --analyze -d contents/core/
python3 scripts/improve_figure_captions.py -a -d contents/core/

# Fix formatting issues only (no LLM)
python3 scripts/improve_figure_captions.py --repair -d contents/core/
python3 scripts/improve_figure_captions.py -r -d contents/core/

Development and Debugging

# Save detailed JSON output for inspection
python3 scripts/improve_figure_captions.py -d contents/core/ --save-json

# List available Ollama models
python3 scripts/improve_figure_captions.py --list-models

# Process single file for testing
python3 scripts/improve_figure_captions.py -f contents/core/introduction/introduction.qmd -m gemma2:9b

Model Selection Guide

Model Speed Quality Use Case
qwen2.5:7b Default - best balance
gemma2:9b High quality output
llama3.2:3b Fast processing
mistral:7b Alternative option

Model Installation

# Install specific models
ollama pull qwen2.5:7b
ollama pull gemma2:9b
ollama pull llama3.2:3b

# Check installed models
ollama list

Caption Quality Standards

Formatting Rules

  • Figures: **Bold Title**: Sentence case explanation.
  • Tables: : **Bold Title**: Sentence case explanation. (note colon prefix)
  • Word limit: Maximum 100 words per caption
  • Language: Strong, direct educational language

Language Improvements

The script automatically:

  • Removes weak starters: "Illustrates", "Shows", "Demonstrates"
  • Uses direct language: "Neural networks process..." instead of "This shows how..."
  • Fixes capitalization: Proper sentence case after periods
  • Normalizes spacing: Single spaces, clean formatting
  • Educational focus: Clear, learning-oriented explanations

Before/After Examples

Before (weak):

Illustrates how machine learning models can serve as amplifiers.

After (strong):

**Amplification Effects**: Machine learning models enable threat actors to scale attacks by automating target identification and payload generation.

Processing Workflow

What the Script Does

  1. Extract: Finds all figures and tables in QMD files (follows _quarto-html.yml order)
  2. Analyze: Builds content map with context extraction
  3. Improve: Uses LLM to generate better captions with quality validation
  4. Update: Applies improvements directly to QMD files
  5. Validate: Ensures proper formatting and structure

Content Map Structure

The script builds a comprehensive map including:

  • 270 figures across core chapters (Markdown, TikZ, Code blocks)
  • 92 tables with proper caption detection
  • Context extraction using paragraph-level analysis
  • 100% success rate with robust extraction patterns

Troubleshooting

Common Issues

Ollama Connection Problems

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama service
ollama serve

# Check available models
ollama list

Extraction Failures

# Analyze extraction issues
python3 scripts/improve_figure_captions.py --analyze -d contents/core/

# Build content map to see details
python3 scripts/improve_figure_captions.py --build-map -d contents/core/

Quality Issues

# Try different model
python3 scripts/improve_figure_captions.py -d contents/core/ -m gemma2:9b

# Check specific file
python3 scripts/improve_figure_captions.py -f problematic_file.qmd --save-json

Performance Optimization

  • Use qwen2.5:7b for best speed/quality balance
  • Process single files for testing: -f filename.qmd
  • Use llama3.2:3b for fastest processing
  • Enable JSON output only when debugging: --save-json

Output Files

Generated Files

content_map.json           # Detailed content structure (if --save-json)
improvements_YYYYMMDD_HHMMSS.json  # Summary of changes made

Content Map Structure

{
  "figures": {
    "fig-ai-timeline": {
      "qmd_file": "contents/core/introduction/introduction.qmd",
      "type": "tikz",
      "original_caption": "...",
      "new_caption": "...",
      "improved": true
    }
  },
  "tables": { ... },
  "metadata": {
    "extraction_stats": {
      "figures_found": 270,
      "tables_found": 92,
      "extraction_failures": 0,
      "success_rate": 100.0
    }
  }
}

Integration with Book Build

Quarto Compatibility

The script works seamlessly with Quarto's build process:

  • Preserves: All Quarto attributes ({#fig-id .class})
  • Maintains: Reference links and cross-references
  • Follows: _quarto-html.yml chapter ordering
  • Supports: TikZ, Markdown, and code block figures

Build Process

# 1. Improve captions
python3 scripts/improve_figure_captions.py -d contents/core/

# 2. Build book normally
quarto render

# 3. Check results
open build/html/index.html

Best Practices

Development Workflow

  1. Test on single file first: -f filename.qmd
  2. Use analyze mode to check structure: --analyze
  3. Try different models for quality comparison
  4. Save JSON output for debugging: --save-json
  5. Commit script changes but review QMD changes carefully

Production Workflow

  1. Use default settings for consistent results
  2. Process all core chapters: -d contents/core/
  3. Verify improvements before committing QMD files
  4. Test Quarto build after caption updates

Quality Assurance

  • Automatic validation: 100-word limit, proper formatting
  • Language improvements: Strong, educational tone
  • Context preservation: Maintains technical accuracy
  • Format consistency: Proper table/figure formatting

Success Metrics

Extraction Quality

  • 100% success rate (270 figures, 92 tables found)
  • Perfect format detection (TikZ, Markdown, Code blocks)
  • Robust table parsing (handles : **bold**: format)
  • Context-aware processing (paragraph-level analysis)

Caption Quality

  • Strong language (eliminates weak starters)
  • Educational focus (clear learning objectives)
  • Proper formatting (consistent spacing, capitalization)
  • Technical accuracy (preserves domain knowledge)

Last Updated: December 2024
Tested With: Quarto 1.5+, Ollama 0.3+, Python 3.8+
Script Version: 2.0 (streamlined options)