Files
cs249r_book/tools/scripts/genai/README_ALT_TEXT.md
Vijay Janapa Reddi c20c73508b feat(accessibility): Add GenAI-powered alt-text generation tools
- Add generate_alt_text.py script for automated image alt-text generation
- Add README_ALT_TEXT.md with detailed usage instructions
- Add QUICK_START_ALT_TEXT.md for quick reference
- Uses Google Gemini API to generate descriptive alt-text for figures

Related to accessibility improvements for image descriptions.
Work in progress - requires GitHub issue tracking.
2025-11-09 16:53:44 -05:00

269 lines
8.4 KiB
Markdown

# Alt Text Generation Tool
This tool automatically generates accessible alt text for images in the Machine Learning Systems book by combining local builds with AI vision models.
## Overview
The alt text generator works by:
1. **Building locally**: Uses `./binder html <chapter>` to build the specified chapter
2. **Parsing HTML**: Extracts all `<figure>` elements with their IDs, captions, and context
3. **Image analysis**: Sends images to a vision model (OpenAI GPT-4 Vision or Ollama's llava)
4. **Source update**: Finds corresponding figures in `.qmd` files and adds `fig-alt` attributes
## Why This Approach?
We scrape from the **built HTML** rather than directly from source because:
- Many figures use TikZ code, which is unreadable without rendering
- We get the actual visual output that readers see
- Context (captions, sections) is cleanly extracted from rendered HTML
- Figure IDs provide perfect matching back to source
## Prerequisites
### For OpenAI (Cloud)
```bash
export OPENAI_API_KEY=your_api_key_here
```
### For Ollama (Local)
1. Install Ollama from https://ollama.ai
2. Pull the llava vision model:
```bash
ollama pull llava
```
### Python Dependencies
```bash
cd /Users/VJ/GitHub/MLSysBook/tools/scripts/genai
pip install -r requirements.txt
```
## Usage
### Basic Usage (Ollama, recommended for testing)
```bash
# Process the introduction chapter
python generate_alt_text.py --chapter intro --provider ollama
# Dry run to see what would happen without modifying files
python generate_alt_text.py --chapter intro --provider ollama --dry-run
```
### Using OpenAI
```bash
# Process the introduction chapter with OpenAI
python generate_alt_text.py --chapter intro --provider openai
```
### Advanced Options
```bash
# Use a different Ollama model
python generate_alt_text.py --chapter intro --provider ollama --model llava:13b
# Use a different OpenAI model
python generate_alt_text.py --chapter intro --provider openai --model gpt-4-vision-preview
# Specify Ollama URL (if not localhost)
python generate_alt_text.py --chapter intro --provider ollama --ollama-url http://192.168.1.100:11434
```
## Chapter Names
Common chapter identifiers:
- `intro` or `introduction` - Introduction chapter
- `ml_systems` - ML Systems chapter
- `dl_primer` - Deep Learning Primer
- `ai_workflow` - AI Workflow
- etc.
The script will search for matching files in the `quarto/contents/core/` directory.
## How It Works
### 1. Building the Chapter
```bash
./binder html intro
```
This builds just the introduction chapter to `quarto/_build/html/`.
### 2. Extracting Figures
The script parses the HTML looking for:
```html
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-ai-timeline-caption-...">
<img src="introduction_files/mediabag/25cf57367...svg" class="img-fluid figure-img">
</div>
<figcaption id="fig-ai-timeline-caption-...">
Figure 2: AI Development Timeline: ...
</figcaption>
</figure>
```
It extracts:
- Figure ID: `fig-ai-timeline` (from the caption ID)
- Image path: `introduction_files/mediabag/25cf57367...svg`
- Caption text: "Figure 2: AI Development Timeline: ..."
- Section heading: From the nearest preceding `<h1>`, `<h2>`, or `<h3>`
### 3. Generating Alt Text
For each image, the script:
- Encodes the image as base64
- Sends it to the vision model with context (caption, section)
- Gets back descriptive alt text following accessibility guidelines
**Prompt guidelines:**
- Be concise (1-2 sentences, ideally under 125 characters)
- Describe what's visually important, not what's obvious from caption
- Focus on key information the image conveys
- For diagrams: structure, flow, relationships
- For graphs: trends, comparisons, insights
- Don't start with "Image of" or "Figure showing"
### 4. Updating Source Files
The script searches `.qmd` files for figure references like:
- `![caption](image.png){#fig-ai-timeline}`
- `#| label: fig-ai-timeline`
- `id="fig-ai-timeline"`
It adds or updates the `fig-alt` attribute:
**Before:**
```markdown
![AI Development Timeline](images/timeline.svg){#fig-ai-timeline}
```
**After:**
```markdown
![AI Development Timeline](images/timeline.svg){#fig-ai-timeline fig-alt="Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"}
```
Or for block figures:
```markdown
#| label: fig-ai-timeline
#| fig-cap: "AI Development Timeline"
#| fig-alt: "Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"
```
## Output
The script provides:
- Progress logging to console and `generate_alt_text.log`
- Summary statistics at the end:
- Total figures found
- Alt text generated
- Files updated
- Errors encountered
## Troubleshooting
### "Could not find HTML file"
- Make sure the chapter name is correct
- Check that the build succeeded
- Look in `quarto/_build/html/contents/core/` for the HTML file
### "Could not find image file"
- The script tries multiple locations for images
- Check the build output directory structure
- Image might be in a subdirectory or mediabag
### "Could not find figure in .qmd file"
- The figure ID in HTML might not match the source
- Check the figure ID format in your `.qmd` files
- Try searching manually: `grep -r "fig-your-id" quarto/contents/core/`
### Ollama connection error
```bash
# Make sure Ollama is running
ollama serve
# Test it's working
curl http://localhost:11434/api/tags
```
### OpenAI API errors
- Check your API key is set: `echo $OPENAI_API_KEY`
- Verify you have credits: https://platform.openai.com/usage
- Check rate limits if processing many images
## Best Practices
1. **Start with dry run**: Always test with `--dry-run` first
2. **One chapter at a time**: Process chapters individually to catch issues early
3. **Review generated text**: Alt text quality matters for accessibility
4. **Use Ollama for testing**: Free and fast for iterating on the workflow
5. **Use OpenAI for production**: Generally produces higher quality descriptions
6. **Commit incrementally**: Commit each chapter separately for easier review
## Future Enhancements
Potential improvements:
- [ ] Batch processing of multiple chapters
- [ ] Alt text quality validation
- [ ] Support for updating existing alt text selectively
- [ ] Integration with the `binder` CLI
- [ ] Cache generated alt text to avoid regenerating
- [ ] Support for other image formats (PDF figures, etc.)
## Examples
### Successful Run
```bash
$ python generate_alt_text.py --chapter intro --provider ollama
2025-10-24 10:30:00 - INFO - Starting alt text generation for chapter: intro
2025-10-24 10:30:00 - INFO - Provider: ollama, Model: llava
2025-10-24 10:30:05 - INFO - Successfully built chapter: intro
2025-10-24 10:30:06 - INFO - Extracting figures from: .../introduction.html
2025-10-24 10:30:06 - INFO - Found figure: fig-ai-timeline
2025-10-24 10:30:06 - INFO - Found figure: fig-ml-workflow
2025-10-24 10:30:06 - INFO - Extracted 2 figures
2025-10-24 10:30:10 - INFO - Generating alt text for fig-ai-timeline using Ollama
2025-10-24 10:30:15 - INFO - Generated alt text: Timeline showing evolution...
2025-10-24 10:30:20 - INFO - Adding alt text to .../introduction.qmd for fig-ai-timeline
2025-10-24 10:30:20 - INFO - Successfully updated .../introduction.qmd
============================================================
ALT TEXT GENERATION SUMMARY
============================================================
Total figures found: 2
Alt text generated: 2
Files updated: 2
Errors: 0
============================================================
✅ Successfully processed chapter: intro
📝 Check the log file for details: generate_alt_text.log
```
### Dry Run
```bash
$ python generate_alt_text.py --chapter intro --provider ollama --dry-run
...
[DRY RUN] Would update .../introduction.qmd
[DRY RUN] Line 45: ![AI Timeline](images/timeline.svg){#fig-ai-timeline}
[DRY RUN] New line: ![AI Timeline](images/timeline.svg){#fig-ai-timeline fig-alt="Timeline..."}
...
⚠️ DRY RUN MODE - No files were actually modified
```
## Contributing
If you improve the alt text generation quality or workflow:
1. Test thoroughly with multiple chapters
2. Update this README with your changes
3. Add examples of improvements
4. Document any new dependencies or requirements
## Related Tools
Other genai tools in this directory:
- `header_update.py` - Update section headers
- `quizzes.py` - Generate quizzes
- `footnote_assistant.py` - Add scholarly footnotes
- `fix_dashes.py` - Fix dash usage in text