Revert "Merge branch 'feature/alt-text-generation' into dev"

This reverts commit 9e2bfe4e64, reversing
changes made to 0b3f04d82d.
This commit is contained in:
Vijay Janapa Reddi
2025-11-10 19:57:42 -05:00
parent 9e2bfe4e64
commit afa6fdd36f
3 changed files with 0 additions and 1171 deletions

View File

@@ -1,163 +0,0 @@
# Quick Start: Testing Alt Text Generation
## Immediate Next Steps
### 1. Install Dependencies (if not already installed)
```bash
cd /Users/VJ/GitHub/MLSysBook/tools/scripts/genai
pip install -r requirements.txt
```
### 2. Setup Ollama for Local Testing
```bash
# Install Ollama from https://ollama.ai (if not installed)
# Then pull the vision model
ollama pull llava
```
### 3. Test with Introduction Chapter (Dry Run)
```bash
cd /Users/VJ/GitHub/MLSysBook
python tools/scripts/genai/generate_alt_text.py --chapter intro --provider ollama --dry-run
```
This will:
1. Build the introduction chapter using `./binder html intro`
2. Extract all figures from the built HTML
3. Generate alt text using Ollama's llava model
4. Show what changes would be made (but not actually modify files)
### 4. Review the Output
Check the console output and `generate_alt_text.log` to see:
- How many figures were found
- What alt text was generated
- Which files would be updated
### 5. If Happy, Run for Real
```bash
python tools/scripts/genai/generate_alt_text.py --chapter intro --provider ollama
```
This will actually modify your `.qmd` files to add `fig-alt` attributes.
## What This Script Does Differently
**Key Innovation**: We use the **rendered output** as the source of truth because:
1. Your TikZ code is unreadable without rendering
2. The built HTML has all figures with clean IDs
3. We can see exactly what readers see
4. Figure IDs provide perfect matching back to source
**Workflow**:
```
Source .qmd → Build → Rendered HTML → Extract Figures → Vision AI → Alt Text → Update Source .qmd
↑ |
└───────────────────────────────────────────────────────────────────────────────┘
```
## Example Output
For a figure like this in your HTML:
```html
<figure class="quarto-float quarto-float-fig figure">
<figcaption>Figure 2: AI Development Timeline</figcaption>
<img src="introduction_files/mediabag/25cf57367...svg">
</figure>
```
The script will:
1. Extract figure ID: `fig-ai-timeline`
2. Download/locate the image
3. Send to vision model with caption context
4. Get alt text: "Timeline showing evolution from symbolic AI in 1950s to modern LLMs"
5. Find in source: `![Timeline](image.svg){#fig-ai-timeline}`
6. Update to: `![Timeline](image.svg){#fig-ai-timeline fig-alt="Timeline showing..."}`
## Testing Tips
1. **Start small**: Test with just the intro chapter first
2. **Use dry-run**: Always do a dry run first to preview changes
3. **Check the log**: `generate_alt_text.log` has detailed progress
4. **Review quality**: Read the generated alt text to ensure it's useful
5. **Iterate**: If quality isn't great, you can adjust the prompt in the script
## Troubleshooting First Run
### If binder build fails:
```bash
# Test binder directly
cd /Users/VJ/GitHub/MLSysBook
./binder html intro
```
### If Ollama connection fails:
```bash
# Make sure Ollama is running
ollama serve
# Test it works
ollama run llava "Describe this image" < some_test_image.png
```
### If figure matching fails:
The script logs which figures it finds and which .qmd files it searches.
Check the log to see if:
- Figures were extracted from HTML
- The figure IDs match what's in your .qmd files
- The .qmd files were found in the right directory
## Next Steps After Testing
Once you verify this works on the introduction chapter:
1. **Process other chapters**: Run for each chapter individually
2. **Review all changes**: Use git diff to review the added alt text
3. **Iterate on quality**: Adjust the prompt if needed for better descriptions
4. **Scale up**: Process the entire book chapter by chapter
## Scaling to Full Book
```bash
# Process each chapter
for chapter in intro ml_systems dl_primer ai_workflow data_engineering; do
echo "Processing $chapter..."
python tools/scripts/genai/generate_alt_text.py --chapter $chapter --provider ollama
git add .
git commit -m "feat(accessibility): Add alt text to $chapter chapter"
done
```
## Cost Considerations
**Ollama (Local)**:
- ✅ Free
- ✅ Unlimited usage
- ✅ Fast for testing
- ⚠️ May be less accurate than GPT-4 Vision
**OpenAI**:
- ⚠️ Costs per image (roughly $0.01-0.03 per image)
- ✅ Higher quality descriptions
- ✅ Better understanding of technical content
- 💡 Use for final production after testing workflow with Ollama
## Questions to Consider
1. **Quality bar**: What level of detail do you want in alt text?
2. **Technical terms**: Should alt text use technical ML terminology?
3. **Length**: Prefer shorter (accessible) or longer (detailed)?
4. **Review process**: Who should review generated alt text?
You can adjust the `ALT_TEXT_PROMPT` in the script to guide the model's behavior.
## Ready to Test?
Run this command to start:
```bash
cd /Users/VJ/GitHub/MLSysBook
python tools/scripts/genai/generate_alt_text.py --chapter intro --provider ollama --dry-run
```
Then check the output and let me know how it goes!

View File

@@ -1,268 +0,0 @@
# Alt Text Generation Tool
This tool automatically generates accessible alt text for images in the Machine Learning Systems book by combining local builds with AI vision models.
## Overview
The alt text generator works by:
1. **Building locally**: Uses `./binder html <chapter>` to build the specified chapter
2. **Parsing HTML**: Extracts all `<figure>` elements with their IDs, captions, and context
3. **Image analysis**: Sends images to a vision model (OpenAI GPT-4 Vision or Ollama's llava)
4. **Source update**: Finds corresponding figures in `.qmd` files and adds `fig-alt` attributes
## Why This Approach?
We scrape from the **built HTML** rather than directly from source because:
- Many figures use TikZ code, which is unreadable without rendering
- We get the actual visual output that readers see
- Context (captions, sections) is cleanly extracted from rendered HTML
- Figure IDs provide perfect matching back to source
## Prerequisites
### For OpenAI (Cloud)
```bash
export OPENAI_API_KEY=your_api_key_here
```
### For Ollama (Local)
1. Install Ollama from https://ollama.ai
2. Pull the llava vision model:
```bash
ollama pull llava
```
### Python Dependencies
```bash
cd /Users/VJ/GitHub/MLSysBook/tools/scripts/genai
pip install -r requirements.txt
```
## Usage
### Basic Usage (Ollama, recommended for testing)
```bash
# Process the introduction chapter
python generate_alt_text.py --chapter intro --provider ollama
# Dry run to see what would happen without modifying files
python generate_alt_text.py --chapter intro --provider ollama --dry-run
```
### Using OpenAI
```bash
# Process the introduction chapter with OpenAI
python generate_alt_text.py --chapter intro --provider openai
```
### Advanced Options
```bash
# Use a different Ollama model
python generate_alt_text.py --chapter intro --provider ollama --model llava:13b
# Use a different OpenAI model
python generate_alt_text.py --chapter intro --provider openai --model gpt-4-vision-preview
# Specify Ollama URL (if not localhost)
python generate_alt_text.py --chapter intro --provider ollama --ollama-url http://192.168.1.100:11434
```
## Chapter Names
Common chapter identifiers:
- `intro` or `introduction` - Introduction chapter
- `ml_systems` - ML Systems chapter
- `dl_primer` - Deep Learning Primer
- `ai_workflow` - AI Workflow
- etc.
The script will search for matching files in the `quarto/contents/core/` directory.
## How It Works
### 1. Building the Chapter
```bash
./binder html intro
```
This builds just the introduction chapter to `quarto/_build/html/`.
### 2. Extracting Figures
The script parses the HTML looking for:
```html
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-ai-timeline-caption-...">
<img src="introduction_files/mediabag/25cf57367...svg" class="img-fluid figure-img">
</div>
<figcaption id="fig-ai-timeline-caption-...">
Figure 2: AI Development Timeline: ...
</figcaption>
</figure>
```
It extracts:
- Figure ID: `fig-ai-timeline` (from the caption ID)
- Image path: `introduction_files/mediabag/25cf57367...svg`
- Caption text: "Figure 2: AI Development Timeline: ..."
- Section heading: From the nearest preceding `<h1>`, `<h2>`, or `<h3>`
### 3. Generating Alt Text
For each image, the script:
- Encodes the image as base64
- Sends it to the vision model with context (caption, section)
- Gets back descriptive alt text following accessibility guidelines
**Prompt guidelines:**
- Be concise (1-2 sentences, ideally under 125 characters)
- Describe what's visually important, not what's obvious from caption
- Focus on key information the image conveys
- For diagrams: structure, flow, relationships
- For graphs: trends, comparisons, insights
- Don't start with "Image of" or "Figure showing"
### 4. Updating Source Files
The script searches `.qmd` files for figure references like:
- `![caption](image.png){#fig-ai-timeline}`
- `#| label: fig-ai-timeline`
- `id="fig-ai-timeline"`
It adds or updates the `fig-alt` attribute:
**Before:**
```markdown
![AI Development Timeline](images/timeline.svg){#fig-ai-timeline}
```
**After:**
```markdown
![AI Development Timeline](images/timeline.svg){#fig-ai-timeline fig-alt="Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"}
```
Or for block figures:
```markdown
#| label: fig-ai-timeline
#| fig-cap: "AI Development Timeline"
#| fig-alt: "Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"
```
## Output
The script provides:
- Progress logging to console and `generate_alt_text.log`
- Summary statistics at the end:
- Total figures found
- Alt text generated
- Files updated
- Errors encountered
## Troubleshooting
### "Could not find HTML file"
- Make sure the chapter name is correct
- Check that the build succeeded
- Look in `quarto/_build/html/contents/core/` for the HTML file
### "Could not find image file"
- The script tries multiple locations for images
- Check the build output directory structure
- Image might be in a subdirectory or mediabag
### "Could not find figure in .qmd file"
- The figure ID in HTML might not match the source
- Check the figure ID format in your `.qmd` files
- Try searching manually: `grep -r "fig-your-id" quarto/contents/core/`
### Ollama connection error
```bash
# Make sure Ollama is running
ollama serve
# Test it's working
curl http://localhost:11434/api/tags
```
### OpenAI API errors
- Check your API key is set: `echo $OPENAI_API_KEY`
- Verify you have credits: https://platform.openai.com/usage
- Check rate limits if processing many images
## Best Practices
1. **Start with dry run**: Always test with `--dry-run` first
2. **One chapter at a time**: Process chapters individually to catch issues early
3. **Review generated text**: Alt text quality matters for accessibility
4. **Use Ollama for testing**: Free and fast for iterating on the workflow
5. **Use OpenAI for production**: Generally produces higher quality descriptions
6. **Commit incrementally**: Commit each chapter separately for easier review
## Future Enhancements
Potential improvements:
- [ ] Batch processing of multiple chapters
- [ ] Alt text quality validation
- [ ] Support for updating existing alt text selectively
- [ ] Integration with the `binder` CLI
- [ ] Cache generated alt text to avoid regenerating
- [ ] Support for other image formats (PDF figures, etc.)
## Examples
### Successful Run
```bash
$ python generate_alt_text.py --chapter intro --provider ollama
2025-10-24 10:30:00 - INFO - Starting alt text generation for chapter: intro
2025-10-24 10:30:00 - INFO - Provider: ollama, Model: llava
2025-10-24 10:30:05 - INFO - Successfully built chapter: intro
2025-10-24 10:30:06 - INFO - Extracting figures from: .../introduction.html
2025-10-24 10:30:06 - INFO - Found figure: fig-ai-timeline
2025-10-24 10:30:06 - INFO - Found figure: fig-ml-workflow
2025-10-24 10:30:06 - INFO - Extracted 2 figures
2025-10-24 10:30:10 - INFO - Generating alt text for fig-ai-timeline using Ollama
2025-10-24 10:30:15 - INFO - Generated alt text: Timeline showing evolution...
2025-10-24 10:30:20 - INFO - Adding alt text to .../introduction.qmd for fig-ai-timeline
2025-10-24 10:30:20 - INFO - Successfully updated .../introduction.qmd
============================================================
ALT TEXT GENERATION SUMMARY
============================================================
Total figures found: 2
Alt text generated: 2
Files updated: 2
Errors: 0
============================================================
✅ Successfully processed chapter: intro
📝 Check the log file for details: generate_alt_text.log
```
### Dry Run
```bash
$ python generate_alt_text.py --chapter intro --provider ollama --dry-run
...
[DRY RUN] Would update .../introduction.qmd
[DRY RUN] Line 45: ![AI Timeline](images/timeline.svg){#fig-ai-timeline}
[DRY RUN] New line: ![AI Timeline](images/timeline.svg){#fig-ai-timeline fig-alt="Timeline..."}
...
⚠️ DRY RUN MODE - No files were actually modified
```
## Contributing
If you improve the alt text generation quality or workflow:
1. Test thoroughly with multiple chapters
2. Update this README with your changes
3. Add examples of improvements
4. Document any new dependencies or requirements
## Related Tools
Other genai tools in this directory:
- `header_update.py` - Update section headers
- `quizzes.py` - Generate quizzes
- `footnote_assistant.py` - Add scholarly footnotes
- `fix_dashes.py` - Fix dash usage in text

View File

@@ -1,740 +0,0 @@
#!/usr/bin/env python3
"""
Generate Alt Text for Images
This script generates accessibility alt text for images in the Machine Learning Systems book.
It works by:
1. Building the book locally (using binder)
2. Parsing the HTML to find figures with their IDs
3. Sending images to a vision model (OpenAI or Ollama)
4. Matching figure IDs back to source .qmd files
5. Adding fig-alt attributes to the source
Usage:
# Using OpenAI (requires OPENAI_API_KEY)
python generate_alt_text.py --chapter intro --provider openai
# Using Ollama (local, requires llava model)
python generate_alt_text.py --chapter intro --provider ollama
# Dry run (no changes to source files)
python generate_alt_text.py --chapter intro --provider ollama --dry-run
"""
import argparse
import base64
import json
import logging
import os
import re
import subprocess
import sys
from pathlib import Path
from typing import Dict, List, Optional, Tuple
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
from openai import OpenAI
# Set up logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.StreamHandler(),
logging.FileHandler("generate_alt_text.log")
]
)
# Constants
REPO_ROOT = Path(__file__).resolve().parents[3]
QUARTO_DIR = REPO_ROOT / "quarto"
BUILD_DIR = QUARTO_DIR / "_build" / "html"
# Prompt template for alt text generation
ALT_TEXT_PROMPT = """You are an expert at creating accessible alt text for images in technical textbooks.
Your task is to write concise, informative alt text for this image from a machine learning systems textbook.
Guidelines:
- Be concise but descriptive (aim for 1-2 sentences, max 125 characters if possible)
- Describe what's visually important, not what's obvious from the caption
- Focus on the key information the image conveys
- For diagrams: describe the structure, flow, and relationships
- For graphs: describe the trend, comparison, or key insight
- For screenshots: describe the UI element and its purpose
- Don't start with "Image of" or "Figure showing"
- Don't repeat information from the caption
Context:
Caption: {caption}
Section: {section}
Provide only the alt text, nothing else."""
class OllamaClient:
"""Simple wrapper for Ollama API to match OpenAI client interface."""
def __init__(self, base_url: str = "http://localhost:11434", model: str = "llava"):
self.base_url = base_url.rstrip('/')
self.model = model
class ChatCompletions:
def __init__(self, parent):
self.parent = parent
def create(self, model: str, messages: List[Dict], **kwargs):
"""Create a chat completion using Ollama API with vision support."""
# Convert OpenAI format to Ollama format
ollama_messages = []
for msg in messages:
if isinstance(msg.get("content"), list):
# Handle vision messages with image content
text_parts = []
images = []
for item in msg["content"]:
if item["type"] == "text":
text_parts.append(item["text"])
elif item["type"] == "image_url":
# Extract base64 image data
image_url = item["image_url"]["url"]
if image_url.startswith("data:image"):
# Extract base64 part
base64_data = image_url.split(",", 1)[1]
images.append(base64_data)
ollama_messages.append({
"role": msg["role"],
"content": " ".join(text_parts),
"images": images
})
else:
ollama_messages.append({
"role": msg["role"],
"content": msg["content"]
})
# Call Ollama API
url = f"{self.parent.base_url}/api/chat"
payload = {
"model": model,
"messages": ollama_messages,
"stream": False
}
response = requests.post(url, json=payload)
if response.status_code == 200:
result = response.json()
# Convert to OpenAI-like response format
class Choice:
def __init__(self, content):
self.message = type('obj', (object,), {'content': content})()
class Response:
def __init__(self, content):
self.choices = [Choice(content)]
return Response(result["message"]["content"])
else:
raise Exception(f"Ollama API error: {response.status_code} - {response.text}")
@property
def chat(self):
if not hasattr(self, '_chat'):
self._chat = type('obj', (object,), {'completions': self.ChatCompletions(self)})()
return self._chat
class FigureInfo:
"""Information about a figure extracted from HTML."""
def __init__(self, figure_id: str, image_path: str, caption: str, section: str = ""):
self.figure_id = figure_id
self.image_path = image_path
self.caption = caption
self.section = section
self.alt_text = None
def __repr__(self):
return f"FigureInfo(id={self.figure_id}, path={self.image_path})"
def build_chapter(chapter: str) -> bool:
"""
Build a specific chapter using binder.
Args:
chapter: Chapter name (e.g., 'intro', 'introduction')
Returns:
True if build succeeded, False otherwise
"""
logging.info(f"Building chapter: {chapter}")
# Check if binder exists
binder_path = REPO_ROOT / "binder"
if not binder_path.exists():
logging.error("binder script not found")
return False
try:
# Run binder html <chapter>
result = subprocess.run(
["./binder", "html", chapter],
cwd=REPO_ROOT,
capture_output=True,
text=True,
timeout=300 # 5 minute timeout
)
if result.returncode == 0:
logging.info(f"Successfully built chapter: {chapter}")
return True
else:
logging.error(f"Build failed: {result.stderr}")
return False
except subprocess.TimeoutExpired:
logging.error("Build timed out after 5 minutes")
return False
except Exception as e:
logging.error(f"Build error: {e}")
return False
def find_html_file(chapter: str) -> Optional[Path]:
"""
Find the HTML file for a chapter in the build directory.
Args:
chapter: Chapter name
Returns:
Path to HTML file or None if not found
"""
# Common chapter name mappings
chapter_files = {
"intro": "introduction.html",
"introduction": "introduction.html",
"ml_systems": "ml_systems.html",
"dl_primer": "dl_primer.html",
# Add more mappings as needed
}
# Try direct mapping
if chapter in chapter_files:
html_file = BUILD_DIR / "contents" / "core" / chapter_files[chapter]
if html_file.exists():
return html_file
# Try searching for files
patterns = [
BUILD_DIR / "contents" / "core" / f"{chapter}.html",
BUILD_DIR / "contents" / "core" / f"*{chapter}*.html",
]
for pattern in patterns:
matches = list(BUILD_DIR.glob(str(pattern.relative_to(BUILD_DIR))))
if matches:
return matches[0]
logging.error(f"Could not find HTML file for chapter: {chapter}")
return None
def extract_figures_from_html(html_path: Path) -> List[FigureInfo]:
"""
Extract figure information from HTML file.
Args:
html_path: Path to HTML file
Returns:
List of FigureInfo objects
"""
logging.info(f"Extracting figures from: {html_path}")
with open(html_path, 'r', encoding='utf-8') as f:
soup = BeautifulSoup(f.read(), 'html.parser')
figures = []
# Find all figure elements
for figure in soup.find_all('figure', class_='quarto-float-fig'):
# Extract figure ID from aria-describedby or figcaption id
figure_id = None
aria_desc = figure.find('div', attrs={'aria-describedby': True})
if aria_desc:
caption_id = aria_desc['aria-describedby']
# Extract fig-xxx from fig-xxx-caption-...
match = re.match(r'(fig-[^-]+(?:-[^-]+)*)-caption', caption_id)
if match:
figure_id = match.group(1)
if not figure_id:
logging.warning("Found figure without ID, skipping")
continue
# Extract image path
img = figure.find('img')
if not img or 'src' not in img.attrs:
logging.warning(f"Figure {figure_id} has no image, skipping")
continue
image_path = img['src']
# Extract caption
figcaption = figure.find('figcaption')
caption = figcaption.get_text(strip=True) if figcaption else ""
# Try to find section heading
section = ""
# Look for nearest preceding heading
prev_heading = figure.find_previous(['h1', 'h2', 'h3'])
if prev_heading:
section = prev_heading.get_text(strip=True)
fig_info = FigureInfo(figure_id, image_path, caption, section)
figures.append(fig_info)
logging.info(f"Found figure: {figure_id}")
logging.info(f"Extracted {len(figures)} figures")
return figures
def encode_image(image_path: Path) -> str:
"""
Encode image to base64 string.
Args:
image_path: Path to image file
Returns:
Base64 encoded string
"""
with open(image_path, "rb") as f:
return base64.b64encode(f.read()).decode('utf-8')
def generate_alt_text_openai(client: OpenAI, figure: FigureInfo, image_path: Path) -> str:
"""
Generate alt text using OpenAI's vision model.
Args:
client: OpenAI client
figure: Figure information
image_path: Path to image file
Returns:
Generated alt text
"""
logging.info(f"Generating alt text for {figure.figure_id} using OpenAI")
# Encode image
base64_image = encode_image(image_path)
# Prepare prompt
prompt = ALT_TEXT_PROMPT.format(
caption=figure.caption,
section=figure.section
)
# Call OpenAI API
response = client.chat.completions.create(
model="gpt-4o", # Updated model that supports vision
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}"
}
}
]
}
],
max_tokens=300
)
alt_text = response.choices[0].message.content.strip()
logging.info(f"Generated alt text: {alt_text}")
return alt_text
def generate_alt_text_ollama(client: OllamaClient, figure: FigureInfo, image_path: Path) -> str:
"""
Generate alt text using Ollama's vision model.
Args:
client: Ollama client
figure: Figure information
image_path: Path to image file
Returns:
Generated alt text
"""
logging.info(f"Generating alt text for {figure.figure_id} using Ollama")
# Encode image
base64_image = encode_image(image_path)
# Prepare prompt
prompt = ALT_TEXT_PROMPT.format(
caption=figure.caption,
section=figure.section
)
# Call Ollama API
response = client.chat.completions.create(
model=client.model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}"
}
}
]
}
]
)
alt_text = response.choices[0].message.content.strip()
logging.info(f"Generated alt text: {alt_text}")
return alt_text
def find_qmd_files(chapter: str) -> List[Path]:
"""
Find .qmd files for a chapter.
Args:
chapter: Chapter name
Returns:
List of .qmd file paths
"""
# Search in contents/core
core_dir = QUARTO_DIR / "contents" / "core"
# Try to find directory matching chapter name
chapter_patterns = [
chapter,
f"*{chapter}*",
]
qmd_files = []
for pattern in chapter_patterns:
matches = list(core_dir.glob(f"{pattern}/**/*.qmd"))
qmd_files.extend(matches)
# Also try direct files
direct_matches = list(core_dir.glob(f"{pattern}.qmd"))
qmd_files.extend(direct_matches)
# Remove duplicates
qmd_files = list(set(qmd_files))
logging.info(f"Found {len(qmd_files)} .qmd files for chapter {chapter}")
return qmd_files
def find_figure_in_qmd(figure_id: str, qmd_path: Path) -> Optional[Tuple[int, str]]:
"""
Find a figure reference in a .qmd file.
Args:
figure_id: Figure ID to find (e.g., 'fig-ai-timeline')
qmd_path: Path to .qmd file
Returns:
Tuple of (line_number, line_content) if found, None otherwise
"""
with open(qmd_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
# Look for figure ID in various forms
patterns = [
rf'\{{#{figure_id}\}}', # {#fig-xxx}
rf'#\| label: {figure_id}', # #| label: fig-xxx
rf'id="{figure_id}"', # id="fig-xxx"
]
for i, line in enumerate(lines):
for pattern in patterns:
if re.search(pattern, line):
return (i, line)
return None
def add_alt_text_to_qmd(figure_id: str, alt_text: str, qmd_path: Path, dry_run: bool = False) -> bool:
"""
Add or update fig-alt attribute in a .qmd file.
Args:
figure_id: Figure ID
alt_text: Alt text to add
qmd_path: Path to .qmd file
dry_run: If True, don't actually modify the file
Returns:
True if successful, False otherwise
"""
logging.info(f"Adding alt text to {qmd_path} for {figure_id}")
with open(qmd_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
# Find the figure
result = find_figure_in_qmd(figure_id, qmd_path)
if not result:
logging.warning(f"Could not find {figure_id} in {qmd_path}")
return False
line_num, line_content = result
# Check if fig-alt already exists
if 'fig-alt' in line_content:
logging.info(f"Figure {figure_id} already has fig-alt, updating")
# Update existing fig-alt
updated_line = re.sub(
r'fig-alt="[^"]*"',
f'fig-alt="{alt_text}"',
line_content
)
else:
# Add fig-alt attribute
# Try to add it to the same line if it's an inline figure
if '{#' in line_content:
# Inline figure: ![caption](path){#fig-xxx}
updated_line = line_content.replace(
f'{{#{figure_id}}}',
f'{{#{figure_id} fig-alt="{alt_text}"}}'
)
else:
# Block figure with #| label:
# Add fig-alt on the next line
lines.insert(line_num + 1, f'#| fig-alt: "{alt_text}"\n')
updated_line = None
if updated_line:
lines[line_num] = updated_line
if dry_run:
logging.info(f"[DRY RUN] Would update {qmd_path}")
logging.info(f"[DRY RUN] Line {line_num}: {line_content.strip()}")
if updated_line:
logging.info(f"[DRY RUN] New line: {updated_line.strip()}")
return True
# Write back to file
with open(qmd_path, 'w', encoding='utf-8') as f:
f.writelines(lines)
logging.info(f"Successfully updated {qmd_path}")
return True
def process_chapter(chapter: str, provider: str, model: str, dry_run: bool = False) -> Dict:
"""
Process a chapter to generate and add alt text.
Args:
chapter: Chapter name
provider: 'openai' or 'ollama'
model: Model name
dry_run: If True, don't modify files
Returns:
Dictionary with processing statistics
"""
stats = {
"total_figures": 0,
"alt_text_generated": 0,
"files_updated": 0,
"errors": 0
}
# Step 1: Build the chapter
logging.info(f"=== Processing chapter: {chapter} ===")
if not build_chapter(chapter):
logging.error("Build failed, aborting")
return stats
# Step 2: Find the HTML file
html_path = find_html_file(chapter)
if not html_path:
logging.error("Could not find HTML file, aborting")
return stats
# Step 3: Extract figures
figures = extract_figures_from_html(html_path)
stats["total_figures"] = len(figures)
if not figures:
logging.info("No figures found in this chapter")
return stats
# Step 4: Initialize client
if provider == "openai":
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
logging.error("OPENAI_API_KEY not set")
return stats
client = OpenAI(api_key=api_key)
else: # ollama
client = OllamaClient(model=model)
# Step 5: Generate alt text for each figure
for figure in figures:
try:
# Find the actual image file
image_rel_path = figure.image_path
# Remove leading / if present
if image_rel_path.startswith('/'):
image_rel_path = image_rel_path[1:]
# Try different possible locations
possible_paths = [
html_path.parent / image_rel_path,
BUILD_DIR / image_rel_path,
html_path.parent / Path(image_rel_path).name,
]
image_path = None
for path in possible_paths:
if path.exists():
image_path = path
break
if not image_path:
logging.warning(f"Could not find image file: {figure.image_path}")
stats["errors"] += 1
continue
# Generate alt text
if provider == "openai":
alt_text = generate_alt_text_openai(client, figure, image_path)
else:
alt_text = generate_alt_text_ollama(client, figure, image_path)
figure.alt_text = alt_text
stats["alt_text_generated"] += 1
except Exception as e:
logging.error(f"Error generating alt text for {figure.figure_id}: {e}")
stats["errors"] += 1
continue
# Step 6: Find .qmd files
qmd_files = find_qmd_files(chapter)
# Step 7: Update .qmd files
for figure in figures:
if not figure.alt_text:
continue
# Try to find which .qmd file contains this figure
found = False
for qmd_path in qmd_files:
if find_figure_in_qmd(figure.figure_id, qmd_path):
if add_alt_text_to_qmd(figure.figure_id, figure.alt_text, qmd_path, dry_run):
stats["files_updated"] += 1
found = True
break
if not found:
logging.warning(f"Could not find {figure.figure_id} in any .qmd file")
stats["errors"] += 1
return stats
def main():
parser = argparse.ArgumentParser(
description="Generate alt text for images in ML Systems book",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__
)
parser.add_argument(
"--chapter", "-c",
required=True,
help="Chapter to process (e.g., 'intro', 'ml_systems')"
)
parser.add_argument(
"--provider",
default="ollama",
choices=["openai", "ollama"],
help="AI provider to use (default: ollama)"
)
parser.add_argument(
"--model",
help="Model to use (default: gpt-4o for OpenAI, llava for Ollama)"
)
parser.add_argument(
"--ollama-url",
default="http://localhost:11434",
help="Ollama API URL (default: http://localhost:11434)"
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Don't modify files, just show what would be done"
)
args = parser.parse_args()
# Set default model
if not args.model:
if args.provider == "openai":
args.model = "gpt-4o"
else:
args.model = "llava"
logging.info(f"Starting alt text generation for chapter: {args.chapter}")
logging.info(f"Provider: {args.provider}, Model: {args.model}")
if args.dry_run:
logging.info("DRY RUN MODE - No files will be modified")
# Process the chapter
stats = process_chapter(args.chapter, args.provider, args.model, args.dry_run)
# Print summary
print("\n" + "="*60)
print("ALT TEXT GENERATION SUMMARY")
print("="*60)
print(f"Total figures found: {stats['total_figures']}")
print(f"Alt text generated: {stats['alt_text_generated']}")
print(f"Files updated: {stats['files_updated']}")
print(f"Errors: {stats['errors']}")
print("="*60)
if args.dry_run:
print("\n⚠️ DRY RUN MODE - No files were actually modified")
else:
print(f"\n✅ Successfully processed chapter: {args.chapter}")
print(f"📝 Check the log file for details: generate_alt_text.log")
if __name__ == "__main__":
main()