mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-04-30 17:48:27 -05:00

Files

Vijay Janapa Reddi c20c73508b feat(accessibility): Add GenAI-powered alt-text generation tools

- Add generate_alt_text.py script for automated image alt-text generation
- Add README_ALT_TEXT.md with detailed usage instructions
- Add QUICK_START_ALT_TEXT.md for quick reference
- Uses Google Gemini API to generate descriptive alt-text for figures

Related to accessibility improvements for image descriptions.
Work in progress - requires GitHub issue tracking.

2025-11-09 16:53:44 -05:00

8.4 KiB

Raw Blame History

Alt Text Generation Tool

This tool automatically generates accessible alt text for images in the Machine Learning Systems book by combining local builds with AI vision models.

Overview

The alt text generator works by:

Building locally: Uses ./binder html <chapter> to build the specified chapter
Parsing HTML: Extracts all <figure> elements with their IDs, captions, and context
Image analysis: Sends images to a vision model (OpenAI GPT-4 Vision or Ollama's llava)
Source update: Finds corresponding figures in .qmd files and adds fig-alt attributes

Why This Approach?

We scrape from the built HTML rather than directly from source because:

Many figures use TikZ code, which is unreadable without rendering
We get the actual visual output that readers see
Context (captions, sections) is cleanly extracted from rendered HTML
Figure IDs provide perfect matching back to source

Prerequisites

For OpenAI (Cloud)

export OPENAI_API_KEY=your_api_key_here

For Ollama (Local)

Install Ollama from https://ollama.ai
Pull the llava vision model:

ollama pull llava

Python Dependencies

cd /Users/VJ/GitHub/MLSysBook/tools/scripts/genai
pip install -r requirements.txt

Usage

Basic Usage (Ollama, recommended for testing)

# Process the introduction chapter
python generate_alt_text.py --chapter intro --provider ollama

# Dry run to see what would happen without modifying files
python generate_alt_text.py --chapter intro --provider ollama --dry-run

Using OpenAI

# Process the introduction chapter with OpenAI
python generate_alt_text.py --chapter intro --provider openai

Advanced Options

# Use a different Ollama model
python generate_alt_text.py --chapter intro --provider ollama --model llava:13b

# Use a different OpenAI model
python generate_alt_text.py --chapter intro --provider openai --model gpt-4-vision-preview

# Specify Ollama URL (if not localhost)
python generate_alt_text.py --chapter intro --provider ollama --ollama-url http://192.168.1.100:11434

Chapter Names

Common chapter identifiers:

intro or introduction - Introduction chapter
ml_systems - ML Systems chapter
dl_primer - Deep Learning Primer
ai_workflow - AI Workflow
etc.

The script will search for matching files in the quarto/contents/core/ directory.

How It Works

1. Building the Chapter

./binder html intro

This builds just the introduction chapter to quarto/_build/html/.

2. Extracting Figures

The script parses the HTML looking for:

<figure class="quarto-float quarto-float-fig figure">
  <div aria-describedby="fig-ai-timeline-caption-...">
    <img src="introduction_files/mediabag/25cf57367...svg" class="img-fluid figure-img">
  </div>
  <figcaption id="fig-ai-timeline-caption-...">
    Figure 2: AI Development Timeline: ...
  </figcaption>
</figure>

It extracts:

Figure ID: fig-ai-timeline (from the caption ID)
Image path: introduction_files/mediabag/25cf57367...svg
Caption text: "Figure 2: AI Development Timeline: ..."
Section heading: From the nearest preceding <h1>, <h2>, or <h3>

3. Generating Alt Text

For each image, the script:

Encodes the image as base64
Sends it to the vision model with context (caption, section)
Gets back descriptive alt text following accessibility guidelines

Prompt guidelines:

Be concise (1-2 sentences, ideally under 125 characters)
Describe what's visually important, not what's obvious from caption
Focus on key information the image conveys
For diagrams: structure, flow, relationships
For graphs: trends, comparisons, insights
Don't start with "Image of" or "Figure showing"

4. Updating Source Files

The script searches .qmd files for figure references like:

![caption](image.png){#fig-ai-timeline}
#| label: fig-ai-timeline
id="fig-ai-timeline"

It adds or updates the fig-alt attribute:

Before:

![AI Development Timeline](images/timeline.svg){#fig-ai-timeline}

After:

![AI Development Timeline](images/timeline.svg){#fig-ai-timeline fig-alt="Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"}

Or for block figures:

#| label: fig-ai-timeline
#| fig-cap: "AI Development Timeline"
#| fig-alt: "Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"

Output

The script provides:

Progress logging to console and generate_alt_text.log
Summary statistics at the end:
- Total figures found
- Alt text generated
- Files updated
- Errors encountered

Troubleshooting

"Could not find HTML file"

Make sure the chapter name is correct
Check that the build succeeded
Look in quarto/_build/html/contents/core/ for the HTML file

"Could not find image file"

The script tries multiple locations for images
Check the build output directory structure
Image might be in a subdirectory or mediabag

"Could not find figure in .qmd file"

The figure ID in HTML might not match the source
Check the figure ID format in your .qmd files
Try searching manually: grep -r "fig-your-id" quarto/contents/core/

Ollama connection error

# Make sure Ollama is running
ollama serve

# Test it's working
curl http://localhost:11434/api/tags

OpenAI API errors

Check your API key is set: echo $OPENAI_API_KEY
Verify you have credits: https://platform.openai.com/usage
Check rate limits if processing many images

Best Practices

Start with dry run: Always test with --dry-run first
One chapter at a time: Process chapters individually to catch issues early
Review generated text: Alt text quality matters for accessibility
Use Ollama for testing: Free and fast for iterating on the workflow
Use OpenAI for production: Generally produces higher quality descriptions
Commit incrementally: Commit each chapter separately for easier review

Future Enhancements

Potential improvements:

Batch processing of multiple chapters
Alt text quality validation
Support for updating existing alt text selectively
Integration with the binder CLI
Cache generated alt text to avoid regenerating
Support for other image formats (PDF figures, etc.)

Examples

Successful Run

$ python generate_alt_text.py --chapter intro --provider ollama

2025-10-24 10:30:00 - INFO - Starting alt text generation for chapter: intro
2025-10-24 10:30:00 - INFO - Provider: ollama, Model: llava
2025-10-24 10:30:05 - INFO - Successfully built chapter: intro
2025-10-24 10:30:06 - INFO - Extracting figures from: .../introduction.html
2025-10-24 10:30:06 - INFO - Found figure: fig-ai-timeline
2025-10-24 10:30:06 - INFO - Found figure: fig-ml-workflow
2025-10-24 10:30:06 - INFO - Extracted 2 figures
2025-10-24 10:30:10 - INFO - Generating alt text for fig-ai-timeline using Ollama
2025-10-24 10:30:15 - INFO - Generated alt text: Timeline showing evolution...
2025-10-24 10:30:20 - INFO - Adding alt text to .../introduction.qmd for fig-ai-timeline
2025-10-24 10:30:20 - INFO - Successfully updated .../introduction.qmd

============================================================
ALT TEXT GENERATION SUMMARY
============================================================
Total figures found: 2
Alt text generated: 2
Files updated: 2
Errors: 0
============================================================

✅ Successfully processed chapter: intro
📝 Check the log file for details: generate_alt_text.log

Dry Run

$ python generate_alt_text.py --chapter intro --provider ollama --dry-run

...
[DRY RUN] Would update .../introduction.qmd
[DRY RUN] Line 45: ![AI Timeline](images/timeline.svg){#fig-ai-timeline}
[DRY RUN] New line: ![AI Timeline](images/timeline.svg){#fig-ai-timeline fig-alt="Timeline..."}
...

⚠️  DRY RUN MODE - No files were actually modified

Contributing

If you improve the alt text generation quality or workflow:

Test thoroughly with multiple chapters
Update this README with your changes
Add examples of improvements
Document any new dependencies or requirements

Other genai tools in this directory:

header_update.py - Update section headers
quizzes.py - Generate quizzes
footnote_assistant.py - Add scholarly footnotes
fix_dashes.py - Fix dash usage in text

8.4 KiB Raw Blame History