- Add generate_alt_text.py script for automated image alt-text generation - Add README_ALT_TEXT.md with detailed usage instructions - Add QUICK_START_ALT_TEXT.md for quick reference - Uses Google Gemini API to generate descriptive alt-text for figures Related to accessibility improvements for image descriptions. Work in progress - requires GitHub issue tracking.
8.4 KiB
Alt Text Generation Tool
This tool automatically generates accessible alt text for images in the Machine Learning Systems book by combining local builds with AI vision models.
Overview
The alt text generator works by:
- Building locally: Uses
./binder html <chapter>to build the specified chapter - Parsing HTML: Extracts all
<figure>elements with their IDs, captions, and context - Image analysis: Sends images to a vision model (OpenAI GPT-4 Vision or Ollama's llava)
- Source update: Finds corresponding figures in
.qmdfiles and addsfig-altattributes
Why This Approach?
We scrape from the built HTML rather than directly from source because:
- Many figures use TikZ code, which is unreadable without rendering
- We get the actual visual output that readers see
- Context (captions, sections) is cleanly extracted from rendered HTML
- Figure IDs provide perfect matching back to source
Prerequisites
For OpenAI (Cloud)
export OPENAI_API_KEY=your_api_key_here
For Ollama (Local)
- Install Ollama from https://ollama.ai
- Pull the llava vision model:
ollama pull llava
Python Dependencies
cd /Users/VJ/GitHub/MLSysBook/tools/scripts/genai
pip install -r requirements.txt
Usage
Basic Usage (Ollama, recommended for testing)
# Process the introduction chapter
python generate_alt_text.py --chapter intro --provider ollama
# Dry run to see what would happen without modifying files
python generate_alt_text.py --chapter intro --provider ollama --dry-run
Using OpenAI
# Process the introduction chapter with OpenAI
python generate_alt_text.py --chapter intro --provider openai
Advanced Options
# Use a different Ollama model
python generate_alt_text.py --chapter intro --provider ollama --model llava:13b
# Use a different OpenAI model
python generate_alt_text.py --chapter intro --provider openai --model gpt-4-vision-preview
# Specify Ollama URL (if not localhost)
python generate_alt_text.py --chapter intro --provider ollama --ollama-url http://192.168.1.100:11434
Chapter Names
Common chapter identifiers:
introorintroduction- Introduction chapterml_systems- ML Systems chapterdl_primer- Deep Learning Primerai_workflow- AI Workflow- etc.
The script will search for matching files in the quarto/contents/core/ directory.
How It Works
1. Building the Chapter
./binder html intro
This builds just the introduction chapter to quarto/_build/html/.
2. Extracting Figures
The script parses the HTML looking for:
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-ai-timeline-caption-...">
<img src="introduction_files/mediabag/25cf57367...svg" class="img-fluid figure-img">
</div>
<figcaption id="fig-ai-timeline-caption-...">
Figure 2: AI Development Timeline: ...
</figcaption>
</figure>
It extracts:
- Figure ID:
fig-ai-timeline(from the caption ID) - Image path:
introduction_files/mediabag/25cf57367...svg - Caption text: "Figure 2: AI Development Timeline: ..."
- Section heading: From the nearest preceding
<h1>,<h2>, or<h3>
3. Generating Alt Text
For each image, the script:
- Encodes the image as base64
- Sends it to the vision model with context (caption, section)
- Gets back descriptive alt text following accessibility guidelines
Prompt guidelines:
- Be concise (1-2 sentences, ideally under 125 characters)
- Describe what's visually important, not what's obvious from caption
- Focus on key information the image conveys
- For diagrams: structure, flow, relationships
- For graphs: trends, comparisons, insights
- Don't start with "Image of" or "Figure showing"
4. Updating Source Files
The script searches .qmd files for figure references like:
{#fig-ai-timeline}#| label: fig-ai-timelineid="fig-ai-timeline"
It adds or updates the fig-alt attribute:
Before:
{#fig-ai-timeline}
After:
{#fig-ai-timeline fig-alt="Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"}
Or for block figures:
#| label: fig-ai-timeline
#| fig-cap: "AI Development Timeline"
#| fig-alt: "Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"
Output
The script provides:
- Progress logging to console and
generate_alt_text.log - Summary statistics at the end:
- Total figures found
- Alt text generated
- Files updated
- Errors encountered
Troubleshooting
"Could not find HTML file"
- Make sure the chapter name is correct
- Check that the build succeeded
- Look in
quarto/_build/html/contents/core/for the HTML file
"Could not find image file"
- The script tries multiple locations for images
- Check the build output directory structure
- Image might be in a subdirectory or mediabag
"Could not find figure in .qmd file"
- The figure ID in HTML might not match the source
- Check the figure ID format in your
.qmdfiles - Try searching manually:
grep -r "fig-your-id" quarto/contents/core/
Ollama connection error
# Make sure Ollama is running
ollama serve
# Test it's working
curl http://localhost:11434/api/tags
OpenAI API errors
- Check your API key is set:
echo $OPENAI_API_KEY - Verify you have credits: https://platform.openai.com/usage
- Check rate limits if processing many images
Best Practices
- Start with dry run: Always test with
--dry-runfirst - One chapter at a time: Process chapters individually to catch issues early
- Review generated text: Alt text quality matters for accessibility
- Use Ollama for testing: Free and fast for iterating on the workflow
- Use OpenAI for production: Generally produces higher quality descriptions
- Commit incrementally: Commit each chapter separately for easier review
Future Enhancements
Potential improvements:
- Batch processing of multiple chapters
- Alt text quality validation
- Support for updating existing alt text selectively
- Integration with the
binderCLI - Cache generated alt text to avoid regenerating
- Support for other image formats (PDF figures, etc.)
Examples
Successful Run
$ python generate_alt_text.py --chapter intro --provider ollama
2025-10-24 10:30:00 - INFO - Starting alt text generation for chapter: intro
2025-10-24 10:30:00 - INFO - Provider: ollama, Model: llava
2025-10-24 10:30:05 - INFO - Successfully built chapter: intro
2025-10-24 10:30:06 - INFO - Extracting figures from: .../introduction.html
2025-10-24 10:30:06 - INFO - Found figure: fig-ai-timeline
2025-10-24 10:30:06 - INFO - Found figure: fig-ml-workflow
2025-10-24 10:30:06 - INFO - Extracted 2 figures
2025-10-24 10:30:10 - INFO - Generating alt text for fig-ai-timeline using Ollama
2025-10-24 10:30:15 - INFO - Generated alt text: Timeline showing evolution...
2025-10-24 10:30:20 - INFO - Adding alt text to .../introduction.qmd for fig-ai-timeline
2025-10-24 10:30:20 - INFO - Successfully updated .../introduction.qmd
============================================================
ALT TEXT GENERATION SUMMARY
============================================================
Total figures found: 2
Alt text generated: 2
Files updated: 2
Errors: 0
============================================================
✅ Successfully processed chapter: intro
📝 Check the log file for details: generate_alt_text.log
Dry Run
$ python generate_alt_text.py --chapter intro --provider ollama --dry-run
...
[DRY RUN] Would update .../introduction.qmd
[DRY RUN] Line 45: {#fig-ai-timeline}
[DRY RUN] New line: {#fig-ai-timeline fig-alt="Timeline..."}
...
⚠️ DRY RUN MODE - No files were actually modified
Contributing
If you improve the alt text generation quality or workflow:
- Test thoroughly with multiple chapters
- Update this README with your changes
- Add examples of improvements
- Document any new dependencies or requirements
Related Tools
Other genai tools in this directory:
header_update.py- Update section headersquizzes.py- Generate quizzesfootnote_assistant.py- Add scholarly footnotesfix_dashes.py- Fix dash usage in text