Files
cs249r_book/tools/scripts/genai/README_ALT_TEXT.md
Vijay Janapa Reddi c20c73508b feat(accessibility): Add GenAI-powered alt-text generation tools
- Add generate_alt_text.py script for automated image alt-text generation
- Add README_ALT_TEXT.md with detailed usage instructions
- Add QUICK_START_ALT_TEXT.md for quick reference
- Uses Google Gemini API to generate descriptive alt-text for figures

Related to accessibility improvements for image descriptions.
Work in progress - requires GitHub issue tracking.
2025-11-09 16:53:44 -05:00

8.4 KiB

Alt Text Generation Tool

This tool automatically generates accessible alt text for images in the Machine Learning Systems book by combining local builds with AI vision models.

Overview

The alt text generator works by:

  1. Building locally: Uses ./binder html <chapter> to build the specified chapter
  2. Parsing HTML: Extracts all <figure> elements with their IDs, captions, and context
  3. Image analysis: Sends images to a vision model (OpenAI GPT-4 Vision or Ollama's llava)
  4. Source update: Finds corresponding figures in .qmd files and adds fig-alt attributes

Why This Approach?

We scrape from the built HTML rather than directly from source because:

  • Many figures use TikZ code, which is unreadable without rendering
  • We get the actual visual output that readers see
  • Context (captions, sections) is cleanly extracted from rendered HTML
  • Figure IDs provide perfect matching back to source

Prerequisites

For OpenAI (Cloud)

export OPENAI_API_KEY=your_api_key_here

For Ollama (Local)

  1. Install Ollama from https://ollama.ai
  2. Pull the llava vision model:
ollama pull llava

Python Dependencies

cd /Users/VJ/GitHub/MLSysBook/tools/scripts/genai
pip install -r requirements.txt

Usage

# Process the introduction chapter
python generate_alt_text.py --chapter intro --provider ollama

# Dry run to see what would happen without modifying files
python generate_alt_text.py --chapter intro --provider ollama --dry-run

Using OpenAI

# Process the introduction chapter with OpenAI
python generate_alt_text.py --chapter intro --provider openai

Advanced Options

# Use a different Ollama model
python generate_alt_text.py --chapter intro --provider ollama --model llava:13b

# Use a different OpenAI model
python generate_alt_text.py --chapter intro --provider openai --model gpt-4-vision-preview

# Specify Ollama URL (if not localhost)
python generate_alt_text.py --chapter intro --provider ollama --ollama-url http://192.168.1.100:11434

Chapter Names

Common chapter identifiers:

  • intro or introduction - Introduction chapter
  • ml_systems - ML Systems chapter
  • dl_primer - Deep Learning Primer
  • ai_workflow - AI Workflow
  • etc.

The script will search for matching files in the quarto/contents/core/ directory.

How It Works

1. Building the Chapter

./binder html intro

This builds just the introduction chapter to quarto/_build/html/.

2. Extracting Figures

The script parses the HTML looking for:

<figure class="quarto-float quarto-float-fig figure">
  <div aria-describedby="fig-ai-timeline-caption-...">
    <img src="introduction_files/mediabag/25cf57367...svg" class="img-fluid figure-img">
  </div>
  <figcaption id="fig-ai-timeline-caption-...">
    Figure 2: AI Development Timeline: ...
  </figcaption>
</figure>

It extracts:

  • Figure ID: fig-ai-timeline (from the caption ID)
  • Image path: introduction_files/mediabag/25cf57367...svg
  • Caption text: "Figure 2: AI Development Timeline: ..."
  • Section heading: From the nearest preceding <h1>, <h2>, or <h3>

3. Generating Alt Text

For each image, the script:

  • Encodes the image as base64
  • Sends it to the vision model with context (caption, section)
  • Gets back descriptive alt text following accessibility guidelines

Prompt guidelines:

  • Be concise (1-2 sentences, ideally under 125 characters)
  • Describe what's visually important, not what's obvious from caption
  • Focus on key information the image conveys
  • For diagrams: structure, flow, relationships
  • For graphs: trends, comparisons, insights
  • Don't start with "Image of" or "Figure showing"

4. Updating Source Files

The script searches .qmd files for figure references like:

  • ![caption](image.png){#fig-ai-timeline}
  • #| label: fig-ai-timeline
  • id="fig-ai-timeline"

It adds or updates the fig-alt attribute:

Before:

![AI Development Timeline](images/timeline.svg){#fig-ai-timeline}

After:

![AI Development Timeline](images/timeline.svg){#fig-ai-timeline fig-alt="Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"}

Or for block figures:

#| label: fig-ai-timeline
#| fig-cap: "AI Development Timeline"
#| fig-alt: "Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"

Output

The script provides:

  • Progress logging to console and generate_alt_text.log
  • Summary statistics at the end:
    • Total figures found
    • Alt text generated
    • Files updated
    • Errors encountered

Troubleshooting

"Could not find HTML file"

  • Make sure the chapter name is correct
  • Check that the build succeeded
  • Look in quarto/_build/html/contents/core/ for the HTML file

"Could not find image file"

  • The script tries multiple locations for images
  • Check the build output directory structure
  • Image might be in a subdirectory or mediabag

"Could not find figure in .qmd file"

  • The figure ID in HTML might not match the source
  • Check the figure ID format in your .qmd files
  • Try searching manually: grep -r "fig-your-id" quarto/contents/core/

Ollama connection error

# Make sure Ollama is running
ollama serve

# Test it's working
curl http://localhost:11434/api/tags

OpenAI API errors

Best Practices

  1. Start with dry run: Always test with --dry-run first
  2. One chapter at a time: Process chapters individually to catch issues early
  3. Review generated text: Alt text quality matters for accessibility
  4. Use Ollama for testing: Free and fast for iterating on the workflow
  5. Use OpenAI for production: Generally produces higher quality descriptions
  6. Commit incrementally: Commit each chapter separately for easier review

Future Enhancements

Potential improvements:

  • Batch processing of multiple chapters
  • Alt text quality validation
  • Support for updating existing alt text selectively
  • Integration with the binder CLI
  • Cache generated alt text to avoid regenerating
  • Support for other image formats (PDF figures, etc.)

Examples

Successful Run

$ python generate_alt_text.py --chapter intro --provider ollama

2025-10-24 10:30:00 - INFO - Starting alt text generation for chapter: intro
2025-10-24 10:30:00 - INFO - Provider: ollama, Model: llava
2025-10-24 10:30:05 - INFO - Successfully built chapter: intro
2025-10-24 10:30:06 - INFO - Extracting figures from: .../introduction.html
2025-10-24 10:30:06 - INFO - Found figure: fig-ai-timeline
2025-10-24 10:30:06 - INFO - Found figure: fig-ml-workflow
2025-10-24 10:30:06 - INFO - Extracted 2 figures
2025-10-24 10:30:10 - INFO - Generating alt text for fig-ai-timeline using Ollama
2025-10-24 10:30:15 - INFO - Generated alt text: Timeline showing evolution...
2025-10-24 10:30:20 - INFO - Adding alt text to .../introduction.qmd for fig-ai-timeline
2025-10-24 10:30:20 - INFO - Successfully updated .../introduction.qmd

============================================================
ALT TEXT GENERATION SUMMARY
============================================================
Total figures found: 2
Alt text generated: 2
Files updated: 2
Errors: 0
============================================================

✅ Successfully processed chapter: intro
📝 Check the log file for details: generate_alt_text.log

Dry Run

$ python generate_alt_text.py --chapter intro --provider ollama --dry-run

...
[DRY RUN] Would update .../introduction.qmd
[DRY RUN] Line 45: ![AI Timeline](images/timeline.svg){#fig-ai-timeline}
[DRY RUN] New line: ![AI Timeline](images/timeline.svg){#fig-ai-timeline fig-alt="Timeline..."}
...

⚠️  DRY RUN MODE - No files were actually modified

Contributing

If you improve the alt text generation quality or workflow:

  1. Test thoroughly with multiple chapters
  2. Update this README with your changes
  3. Add examples of improvements
  4. Document any new dependencies or requirements

Other genai tools in this directory:

  • header_update.py - Update section headers
  • quizzes.py - Generate quizzes
  • footnote_assistant.py - Add scholarly footnotes
  • fix_dashes.py - Fix dash usage in text