mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-04-30 17:48:27 -05:00
- Add generate_alt_text.py script for automated image alt-text generation - Add README_ALT_TEXT.md with detailed usage instructions - Add QUICK_START_ALT_TEXT.md for quick reference - Uses Google Gemini API to generate descriptive alt-text for figures Related to accessibility improvements for image descriptions. Work in progress - requires GitHub issue tracking.
269 lines
8.4 KiB
Markdown
269 lines
8.4 KiB
Markdown
# Alt Text Generation Tool
|
|
|
|
This tool automatically generates accessible alt text for images in the Machine Learning Systems book by combining local builds with AI vision models.
|
|
|
|
## Overview
|
|
|
|
The alt text generator works by:
|
|
|
|
1. **Building locally**: Uses `./binder html <chapter>` to build the specified chapter
|
|
2. **Parsing HTML**: Extracts all `<figure>` elements with their IDs, captions, and context
|
|
3. **Image analysis**: Sends images to a vision model (OpenAI GPT-4 Vision or Ollama's llava)
|
|
4. **Source update**: Finds corresponding figures in `.qmd` files and adds `fig-alt` attributes
|
|
|
|
## Why This Approach?
|
|
|
|
We scrape from the **built HTML** rather than directly from source because:
|
|
- Many figures use TikZ code, which is unreadable without rendering
|
|
- We get the actual visual output that readers see
|
|
- Context (captions, sections) is cleanly extracted from rendered HTML
|
|
- Figure IDs provide perfect matching back to source
|
|
|
|
## Prerequisites
|
|
|
|
### For OpenAI (Cloud)
|
|
```bash
|
|
export OPENAI_API_KEY=your_api_key_here
|
|
```
|
|
|
|
### For Ollama (Local)
|
|
1. Install Ollama from https://ollama.ai
|
|
2. Pull the llava vision model:
|
|
```bash
|
|
ollama pull llava
|
|
```
|
|
|
|
### Python Dependencies
|
|
```bash
|
|
cd /Users/VJ/GitHub/MLSysBook/tools/scripts/genai
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Usage (Ollama, recommended for testing)
|
|
```bash
|
|
# Process the introduction chapter
|
|
python generate_alt_text.py --chapter intro --provider ollama
|
|
|
|
# Dry run to see what would happen without modifying files
|
|
python generate_alt_text.py --chapter intro --provider ollama --dry-run
|
|
```
|
|
|
|
### Using OpenAI
|
|
```bash
|
|
# Process the introduction chapter with OpenAI
|
|
python generate_alt_text.py --chapter intro --provider openai
|
|
```
|
|
|
|
### Advanced Options
|
|
```bash
|
|
# Use a different Ollama model
|
|
python generate_alt_text.py --chapter intro --provider ollama --model llava:13b
|
|
|
|
# Use a different OpenAI model
|
|
python generate_alt_text.py --chapter intro --provider openai --model gpt-4-vision-preview
|
|
|
|
# Specify Ollama URL (if not localhost)
|
|
python generate_alt_text.py --chapter intro --provider ollama --ollama-url http://192.168.1.100:11434
|
|
```
|
|
|
|
## Chapter Names
|
|
|
|
Common chapter identifiers:
|
|
- `intro` or `introduction` - Introduction chapter
|
|
- `ml_systems` - ML Systems chapter
|
|
- `dl_primer` - Deep Learning Primer
|
|
- `ai_workflow` - AI Workflow
|
|
- etc.
|
|
|
|
The script will search for matching files in the `quarto/contents/core/` directory.
|
|
|
|
## How It Works
|
|
|
|
### 1. Building the Chapter
|
|
```bash
|
|
./binder html intro
|
|
```
|
|
This builds just the introduction chapter to `quarto/_build/html/`.
|
|
|
|
### 2. Extracting Figures
|
|
The script parses the HTML looking for:
|
|
```html
|
|
<figure class="quarto-float quarto-float-fig figure">
|
|
<div aria-describedby="fig-ai-timeline-caption-...">
|
|
<img src="introduction_files/mediabag/25cf57367...svg" class="img-fluid figure-img">
|
|
</div>
|
|
<figcaption id="fig-ai-timeline-caption-...">
|
|
Figure 2: AI Development Timeline: ...
|
|
</figcaption>
|
|
</figure>
|
|
```
|
|
|
|
It extracts:
|
|
- Figure ID: `fig-ai-timeline` (from the caption ID)
|
|
- Image path: `introduction_files/mediabag/25cf57367...svg`
|
|
- Caption text: "Figure 2: AI Development Timeline: ..."
|
|
- Section heading: From the nearest preceding `<h1>`, `<h2>`, or `<h3>`
|
|
|
|
### 3. Generating Alt Text
|
|
For each image, the script:
|
|
- Encodes the image as base64
|
|
- Sends it to the vision model with context (caption, section)
|
|
- Gets back descriptive alt text following accessibility guidelines
|
|
|
|
**Prompt guidelines:**
|
|
- Be concise (1-2 sentences, ideally under 125 characters)
|
|
- Describe what's visually important, not what's obvious from caption
|
|
- Focus on key information the image conveys
|
|
- For diagrams: structure, flow, relationships
|
|
- For graphs: trends, comparisons, insights
|
|
- Don't start with "Image of" or "Figure showing"
|
|
|
|
### 4. Updating Source Files
|
|
The script searches `.qmd` files for figure references like:
|
|
- `{#fig-ai-timeline}`
|
|
- `#| label: fig-ai-timeline`
|
|
- `id="fig-ai-timeline"`
|
|
|
|
It adds or updates the `fig-alt` attribute:
|
|
|
|
**Before:**
|
|
```markdown
|
|
{#fig-ai-timeline}
|
|
```
|
|
|
|
**After:**
|
|
```markdown
|
|
{#fig-ai-timeline fig-alt="Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"}
|
|
```
|
|
|
|
Or for block figures:
|
|
```markdown
|
|
#| label: fig-ai-timeline
|
|
#| fig-cap: "AI Development Timeline"
|
|
#| fig-alt: "Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"
|
|
```
|
|
|
|
## Output
|
|
|
|
The script provides:
|
|
- Progress logging to console and `generate_alt_text.log`
|
|
- Summary statistics at the end:
|
|
- Total figures found
|
|
- Alt text generated
|
|
- Files updated
|
|
- Errors encountered
|
|
|
|
## Troubleshooting
|
|
|
|
### "Could not find HTML file"
|
|
- Make sure the chapter name is correct
|
|
- Check that the build succeeded
|
|
- Look in `quarto/_build/html/contents/core/` for the HTML file
|
|
|
|
### "Could not find image file"
|
|
- The script tries multiple locations for images
|
|
- Check the build output directory structure
|
|
- Image might be in a subdirectory or mediabag
|
|
|
|
### "Could not find figure in .qmd file"
|
|
- The figure ID in HTML might not match the source
|
|
- Check the figure ID format in your `.qmd` files
|
|
- Try searching manually: `grep -r "fig-your-id" quarto/contents/core/`
|
|
|
|
### Ollama connection error
|
|
```bash
|
|
# Make sure Ollama is running
|
|
ollama serve
|
|
|
|
# Test it's working
|
|
curl http://localhost:11434/api/tags
|
|
```
|
|
|
|
### OpenAI API errors
|
|
- Check your API key is set: `echo $OPENAI_API_KEY`
|
|
- Verify you have credits: https://platform.openai.com/usage
|
|
- Check rate limits if processing many images
|
|
|
|
## Best Practices
|
|
|
|
1. **Start with dry run**: Always test with `--dry-run` first
|
|
2. **One chapter at a time**: Process chapters individually to catch issues early
|
|
3. **Review generated text**: Alt text quality matters for accessibility
|
|
4. **Use Ollama for testing**: Free and fast for iterating on the workflow
|
|
5. **Use OpenAI for production**: Generally produces higher quality descriptions
|
|
6. **Commit incrementally**: Commit each chapter separately for easier review
|
|
|
|
## Future Enhancements
|
|
|
|
Potential improvements:
|
|
- [ ] Batch processing of multiple chapters
|
|
- [ ] Alt text quality validation
|
|
- [ ] Support for updating existing alt text selectively
|
|
- [ ] Integration with the `binder` CLI
|
|
- [ ] Cache generated alt text to avoid regenerating
|
|
- [ ] Support for other image formats (PDF figures, etc.)
|
|
|
|
## Examples
|
|
|
|
### Successful Run
|
|
```bash
|
|
$ python generate_alt_text.py --chapter intro --provider ollama
|
|
|
|
2025-10-24 10:30:00 - INFO - Starting alt text generation for chapter: intro
|
|
2025-10-24 10:30:00 - INFO - Provider: ollama, Model: llava
|
|
2025-10-24 10:30:05 - INFO - Successfully built chapter: intro
|
|
2025-10-24 10:30:06 - INFO - Extracting figures from: .../introduction.html
|
|
2025-10-24 10:30:06 - INFO - Found figure: fig-ai-timeline
|
|
2025-10-24 10:30:06 - INFO - Found figure: fig-ml-workflow
|
|
2025-10-24 10:30:06 - INFO - Extracted 2 figures
|
|
2025-10-24 10:30:10 - INFO - Generating alt text for fig-ai-timeline using Ollama
|
|
2025-10-24 10:30:15 - INFO - Generated alt text: Timeline showing evolution...
|
|
2025-10-24 10:30:20 - INFO - Adding alt text to .../introduction.qmd for fig-ai-timeline
|
|
2025-10-24 10:30:20 - INFO - Successfully updated .../introduction.qmd
|
|
|
|
============================================================
|
|
ALT TEXT GENERATION SUMMARY
|
|
============================================================
|
|
Total figures found: 2
|
|
Alt text generated: 2
|
|
Files updated: 2
|
|
Errors: 0
|
|
============================================================
|
|
|
|
✅ Successfully processed chapter: intro
|
|
📝 Check the log file for details: generate_alt_text.log
|
|
```
|
|
|
|
### Dry Run
|
|
```bash
|
|
$ python generate_alt_text.py --chapter intro --provider ollama --dry-run
|
|
|
|
...
|
|
[DRY RUN] Would update .../introduction.qmd
|
|
[DRY RUN] Line 45: {#fig-ai-timeline}
|
|
[DRY RUN] New line: {#fig-ai-timeline fig-alt="Timeline..."}
|
|
...
|
|
|
|
⚠️ DRY RUN MODE - No files were actually modified
|
|
```
|
|
|
|
## Contributing
|
|
|
|
If you improve the alt text generation quality or workflow:
|
|
1. Test thoroughly with multiple chapters
|
|
2. Update this README with your changes
|
|
3. Add examples of improvements
|
|
4. Document any new dependencies or requirements
|
|
|
|
## Related Tools
|
|
|
|
Other genai tools in this directory:
|
|
- `header_update.py` - Update section headers
|
|
- `quizzes.py` - Generate quizzes
|
|
- `footnote_assistant.py` - Add scholarly footnotes
|
|
- `fix_dashes.py` - Fix dash usage in text
|
|
|
|
|