mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-04-29 00:59:07 -05:00
Revert "Merge branch 'feature/alt-text-generation' into dev"
This reverts commit9e2bfe4e64, reversing changes made to0b3f04d82d.
This commit is contained in:
@@ -1,163 +0,0 @@
|
||||
# Quick Start: Testing Alt Text Generation
|
||||
|
||||
## Immediate Next Steps
|
||||
|
||||
### 1. Install Dependencies (if not already installed)
|
||||
```bash
|
||||
cd /Users/VJ/GitHub/MLSysBook/tools/scripts/genai
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 2. Setup Ollama for Local Testing
|
||||
```bash
|
||||
# Install Ollama from https://ollama.ai (if not installed)
|
||||
# Then pull the vision model
|
||||
ollama pull llava
|
||||
```
|
||||
|
||||
### 3. Test with Introduction Chapter (Dry Run)
|
||||
```bash
|
||||
cd /Users/VJ/GitHub/MLSysBook
|
||||
python tools/scripts/genai/generate_alt_text.py --chapter intro --provider ollama --dry-run
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Build the introduction chapter using `./binder html intro`
|
||||
2. Extract all figures from the built HTML
|
||||
3. Generate alt text using Ollama's llava model
|
||||
4. Show what changes would be made (but not actually modify files)
|
||||
|
||||
### 4. Review the Output
|
||||
Check the console output and `generate_alt_text.log` to see:
|
||||
- How many figures were found
|
||||
- What alt text was generated
|
||||
- Which files would be updated
|
||||
|
||||
### 5. If Happy, Run for Real
|
||||
```bash
|
||||
python tools/scripts/genai/generate_alt_text.py --chapter intro --provider ollama
|
||||
```
|
||||
|
||||
This will actually modify your `.qmd` files to add `fig-alt` attributes.
|
||||
|
||||
## What This Script Does Differently
|
||||
|
||||
**Key Innovation**: We use the **rendered output** as the source of truth because:
|
||||
1. Your TikZ code is unreadable without rendering
|
||||
2. The built HTML has all figures with clean IDs
|
||||
3. We can see exactly what readers see
|
||||
4. Figure IDs provide perfect matching back to source
|
||||
|
||||
**Workflow**:
|
||||
```
|
||||
Source .qmd → Build → Rendered HTML → Extract Figures → Vision AI → Alt Text → Update Source .qmd
|
||||
↑ |
|
||||
└───────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Example Output
|
||||
|
||||
For a figure like this in your HTML:
|
||||
```html
|
||||
<figure class="quarto-float quarto-float-fig figure">
|
||||
<figcaption>Figure 2: AI Development Timeline</figcaption>
|
||||
<img src="introduction_files/mediabag/25cf57367...svg">
|
||||
</figure>
|
||||
```
|
||||
|
||||
The script will:
|
||||
1. Extract figure ID: `fig-ai-timeline`
|
||||
2. Download/locate the image
|
||||
3. Send to vision model with caption context
|
||||
4. Get alt text: "Timeline showing evolution from symbolic AI in 1950s to modern LLMs"
|
||||
5. Find in source: `{#fig-ai-timeline}`
|
||||
6. Update to: `{#fig-ai-timeline fig-alt="Timeline showing..."}`
|
||||
|
||||
## Testing Tips
|
||||
|
||||
1. **Start small**: Test with just the intro chapter first
|
||||
2. **Use dry-run**: Always do a dry run first to preview changes
|
||||
3. **Check the log**: `generate_alt_text.log` has detailed progress
|
||||
4. **Review quality**: Read the generated alt text to ensure it's useful
|
||||
5. **Iterate**: If quality isn't great, you can adjust the prompt in the script
|
||||
|
||||
## Troubleshooting First Run
|
||||
|
||||
### If binder build fails:
|
||||
```bash
|
||||
# Test binder directly
|
||||
cd /Users/VJ/GitHub/MLSysBook
|
||||
./binder html intro
|
||||
```
|
||||
|
||||
### If Ollama connection fails:
|
||||
```bash
|
||||
# Make sure Ollama is running
|
||||
ollama serve
|
||||
|
||||
# Test it works
|
||||
ollama run llava "Describe this image" < some_test_image.png
|
||||
```
|
||||
|
||||
### If figure matching fails:
|
||||
The script logs which figures it finds and which .qmd files it searches.
|
||||
Check the log to see if:
|
||||
- Figures were extracted from HTML
|
||||
- The figure IDs match what's in your .qmd files
|
||||
- The .qmd files were found in the right directory
|
||||
|
||||
## Next Steps After Testing
|
||||
|
||||
Once you verify this works on the introduction chapter:
|
||||
|
||||
1. **Process other chapters**: Run for each chapter individually
|
||||
2. **Review all changes**: Use git diff to review the added alt text
|
||||
3. **Iterate on quality**: Adjust the prompt if needed for better descriptions
|
||||
4. **Scale up**: Process the entire book chapter by chapter
|
||||
|
||||
## Scaling to Full Book
|
||||
|
||||
```bash
|
||||
# Process each chapter
|
||||
for chapter in intro ml_systems dl_primer ai_workflow data_engineering; do
|
||||
echo "Processing $chapter..."
|
||||
python tools/scripts/genai/generate_alt_text.py --chapter $chapter --provider ollama
|
||||
git add .
|
||||
git commit -m "feat(accessibility): Add alt text to $chapter chapter"
|
||||
done
|
||||
```
|
||||
|
||||
## Cost Considerations
|
||||
|
||||
**Ollama (Local)**:
|
||||
- ✅ Free
|
||||
- ✅ Unlimited usage
|
||||
- ✅ Fast for testing
|
||||
- ⚠️ May be less accurate than GPT-4 Vision
|
||||
|
||||
**OpenAI**:
|
||||
- ⚠️ Costs per image (roughly $0.01-0.03 per image)
|
||||
- ✅ Higher quality descriptions
|
||||
- ✅ Better understanding of technical content
|
||||
- 💡 Use for final production after testing workflow with Ollama
|
||||
|
||||
## Questions to Consider
|
||||
|
||||
1. **Quality bar**: What level of detail do you want in alt text?
|
||||
2. **Technical terms**: Should alt text use technical ML terminology?
|
||||
3. **Length**: Prefer shorter (accessible) or longer (detailed)?
|
||||
4. **Review process**: Who should review generated alt text?
|
||||
|
||||
You can adjust the `ALT_TEXT_PROMPT` in the script to guide the model's behavior.
|
||||
|
||||
## Ready to Test?
|
||||
|
||||
Run this command to start:
|
||||
```bash
|
||||
cd /Users/VJ/GitHub/MLSysBook
|
||||
python tools/scripts/genai/generate_alt_text.py --chapter intro --provider ollama --dry-run
|
||||
```
|
||||
|
||||
Then check the output and let me know how it goes!
|
||||
|
||||
|
||||
@@ -1,268 +0,0 @@
|
||||
# Alt Text Generation Tool
|
||||
|
||||
This tool automatically generates accessible alt text for images in the Machine Learning Systems book by combining local builds with AI vision models.
|
||||
|
||||
## Overview
|
||||
|
||||
The alt text generator works by:
|
||||
|
||||
1. **Building locally**: Uses `./binder html <chapter>` to build the specified chapter
|
||||
2. **Parsing HTML**: Extracts all `<figure>` elements with their IDs, captions, and context
|
||||
3. **Image analysis**: Sends images to a vision model (OpenAI GPT-4 Vision or Ollama's llava)
|
||||
4. **Source update**: Finds corresponding figures in `.qmd` files and adds `fig-alt` attributes
|
||||
|
||||
## Why This Approach?
|
||||
|
||||
We scrape from the **built HTML** rather than directly from source because:
|
||||
- Many figures use TikZ code, which is unreadable without rendering
|
||||
- We get the actual visual output that readers see
|
||||
- Context (captions, sections) is cleanly extracted from rendered HTML
|
||||
- Figure IDs provide perfect matching back to source
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### For OpenAI (Cloud)
|
||||
```bash
|
||||
export OPENAI_API_KEY=your_api_key_here
|
||||
```
|
||||
|
||||
### For Ollama (Local)
|
||||
1. Install Ollama from https://ollama.ai
|
||||
2. Pull the llava vision model:
|
||||
```bash
|
||||
ollama pull llava
|
||||
```
|
||||
|
||||
### Python Dependencies
|
||||
```bash
|
||||
cd /Users/VJ/GitHub/MLSysBook/tools/scripts/genai
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Usage (Ollama, recommended for testing)
|
||||
```bash
|
||||
# Process the introduction chapter
|
||||
python generate_alt_text.py --chapter intro --provider ollama
|
||||
|
||||
# Dry run to see what would happen without modifying files
|
||||
python generate_alt_text.py --chapter intro --provider ollama --dry-run
|
||||
```
|
||||
|
||||
### Using OpenAI
|
||||
```bash
|
||||
# Process the introduction chapter with OpenAI
|
||||
python generate_alt_text.py --chapter intro --provider openai
|
||||
```
|
||||
|
||||
### Advanced Options
|
||||
```bash
|
||||
# Use a different Ollama model
|
||||
python generate_alt_text.py --chapter intro --provider ollama --model llava:13b
|
||||
|
||||
# Use a different OpenAI model
|
||||
python generate_alt_text.py --chapter intro --provider openai --model gpt-4-vision-preview
|
||||
|
||||
# Specify Ollama URL (if not localhost)
|
||||
python generate_alt_text.py --chapter intro --provider ollama --ollama-url http://192.168.1.100:11434
|
||||
```
|
||||
|
||||
## Chapter Names
|
||||
|
||||
Common chapter identifiers:
|
||||
- `intro` or `introduction` - Introduction chapter
|
||||
- `ml_systems` - ML Systems chapter
|
||||
- `dl_primer` - Deep Learning Primer
|
||||
- `ai_workflow` - AI Workflow
|
||||
- etc.
|
||||
|
||||
The script will search for matching files in the `quarto/contents/core/` directory.
|
||||
|
||||
## How It Works
|
||||
|
||||
### 1. Building the Chapter
|
||||
```bash
|
||||
./binder html intro
|
||||
```
|
||||
This builds just the introduction chapter to `quarto/_build/html/`.
|
||||
|
||||
### 2. Extracting Figures
|
||||
The script parses the HTML looking for:
|
||||
```html
|
||||
<figure class="quarto-float quarto-float-fig figure">
|
||||
<div aria-describedby="fig-ai-timeline-caption-...">
|
||||
<img src="introduction_files/mediabag/25cf57367...svg" class="img-fluid figure-img">
|
||||
</div>
|
||||
<figcaption id="fig-ai-timeline-caption-...">
|
||||
Figure 2: AI Development Timeline: ...
|
||||
</figcaption>
|
||||
</figure>
|
||||
```
|
||||
|
||||
It extracts:
|
||||
- Figure ID: `fig-ai-timeline` (from the caption ID)
|
||||
- Image path: `introduction_files/mediabag/25cf57367...svg`
|
||||
- Caption text: "Figure 2: AI Development Timeline: ..."
|
||||
- Section heading: From the nearest preceding `<h1>`, `<h2>`, or `<h3>`
|
||||
|
||||
### 3. Generating Alt Text
|
||||
For each image, the script:
|
||||
- Encodes the image as base64
|
||||
- Sends it to the vision model with context (caption, section)
|
||||
- Gets back descriptive alt text following accessibility guidelines
|
||||
|
||||
**Prompt guidelines:**
|
||||
- Be concise (1-2 sentences, ideally under 125 characters)
|
||||
- Describe what's visually important, not what's obvious from caption
|
||||
- Focus on key information the image conveys
|
||||
- For diagrams: structure, flow, relationships
|
||||
- For graphs: trends, comparisons, insights
|
||||
- Don't start with "Image of" or "Figure showing"
|
||||
|
||||
### 4. Updating Source Files
|
||||
The script searches `.qmd` files for figure references like:
|
||||
- `{#fig-ai-timeline}`
|
||||
- `#| label: fig-ai-timeline`
|
||||
- `id="fig-ai-timeline"`
|
||||
|
||||
It adds or updates the `fig-alt` attribute:
|
||||
|
||||
**Before:**
|
||||
```markdown
|
||||
{#fig-ai-timeline}
|
||||
```
|
||||
|
||||
**After:**
|
||||
```markdown
|
||||
{#fig-ai-timeline fig-alt="Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"}
|
||||
```
|
||||
|
||||
Or for block figures:
|
||||
```markdown
|
||||
#| label: fig-ai-timeline
|
||||
#| fig-cap: "AI Development Timeline"
|
||||
#| fig-alt: "Timeline showing evolution from symbolic AI in 1950s through neural networks to modern large language models"
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
The script provides:
|
||||
- Progress logging to console and `generate_alt_text.log`
|
||||
- Summary statistics at the end:
|
||||
- Total figures found
|
||||
- Alt text generated
|
||||
- Files updated
|
||||
- Errors encountered
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Could not find HTML file"
|
||||
- Make sure the chapter name is correct
|
||||
- Check that the build succeeded
|
||||
- Look in `quarto/_build/html/contents/core/` for the HTML file
|
||||
|
||||
### "Could not find image file"
|
||||
- The script tries multiple locations for images
|
||||
- Check the build output directory structure
|
||||
- Image might be in a subdirectory or mediabag
|
||||
|
||||
### "Could not find figure in .qmd file"
|
||||
- The figure ID in HTML might not match the source
|
||||
- Check the figure ID format in your `.qmd` files
|
||||
- Try searching manually: `grep -r "fig-your-id" quarto/contents/core/`
|
||||
|
||||
### Ollama connection error
|
||||
```bash
|
||||
# Make sure Ollama is running
|
||||
ollama serve
|
||||
|
||||
# Test it's working
|
||||
curl http://localhost:11434/api/tags
|
||||
```
|
||||
|
||||
### OpenAI API errors
|
||||
- Check your API key is set: `echo $OPENAI_API_KEY`
|
||||
- Verify you have credits: https://platform.openai.com/usage
|
||||
- Check rate limits if processing many images
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Start with dry run**: Always test with `--dry-run` first
|
||||
2. **One chapter at a time**: Process chapters individually to catch issues early
|
||||
3. **Review generated text**: Alt text quality matters for accessibility
|
||||
4. **Use Ollama for testing**: Free and fast for iterating on the workflow
|
||||
5. **Use OpenAI for production**: Generally produces higher quality descriptions
|
||||
6. **Commit incrementally**: Commit each chapter separately for easier review
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Potential improvements:
|
||||
- [ ] Batch processing of multiple chapters
|
||||
- [ ] Alt text quality validation
|
||||
- [ ] Support for updating existing alt text selectively
|
||||
- [ ] Integration with the `binder` CLI
|
||||
- [ ] Cache generated alt text to avoid regenerating
|
||||
- [ ] Support for other image formats (PDF figures, etc.)
|
||||
|
||||
## Examples
|
||||
|
||||
### Successful Run
|
||||
```bash
|
||||
$ python generate_alt_text.py --chapter intro --provider ollama
|
||||
|
||||
2025-10-24 10:30:00 - INFO - Starting alt text generation for chapter: intro
|
||||
2025-10-24 10:30:00 - INFO - Provider: ollama, Model: llava
|
||||
2025-10-24 10:30:05 - INFO - Successfully built chapter: intro
|
||||
2025-10-24 10:30:06 - INFO - Extracting figures from: .../introduction.html
|
||||
2025-10-24 10:30:06 - INFO - Found figure: fig-ai-timeline
|
||||
2025-10-24 10:30:06 - INFO - Found figure: fig-ml-workflow
|
||||
2025-10-24 10:30:06 - INFO - Extracted 2 figures
|
||||
2025-10-24 10:30:10 - INFO - Generating alt text for fig-ai-timeline using Ollama
|
||||
2025-10-24 10:30:15 - INFO - Generated alt text: Timeline showing evolution...
|
||||
2025-10-24 10:30:20 - INFO - Adding alt text to .../introduction.qmd for fig-ai-timeline
|
||||
2025-10-24 10:30:20 - INFO - Successfully updated .../introduction.qmd
|
||||
|
||||
============================================================
|
||||
ALT TEXT GENERATION SUMMARY
|
||||
============================================================
|
||||
Total figures found: 2
|
||||
Alt text generated: 2
|
||||
Files updated: 2
|
||||
Errors: 0
|
||||
============================================================
|
||||
|
||||
✅ Successfully processed chapter: intro
|
||||
📝 Check the log file for details: generate_alt_text.log
|
||||
```
|
||||
|
||||
### Dry Run
|
||||
```bash
|
||||
$ python generate_alt_text.py --chapter intro --provider ollama --dry-run
|
||||
|
||||
...
|
||||
[DRY RUN] Would update .../introduction.qmd
|
||||
[DRY RUN] Line 45: {#fig-ai-timeline}
|
||||
[DRY RUN] New line: {#fig-ai-timeline fig-alt="Timeline..."}
|
||||
...
|
||||
|
||||
⚠️ DRY RUN MODE - No files were actually modified
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
If you improve the alt text generation quality or workflow:
|
||||
1. Test thoroughly with multiple chapters
|
||||
2. Update this README with your changes
|
||||
3. Add examples of improvements
|
||||
4. Document any new dependencies or requirements
|
||||
|
||||
## Related Tools
|
||||
|
||||
Other genai tools in this directory:
|
||||
- `header_update.py` - Update section headers
|
||||
- `quizzes.py` - Generate quizzes
|
||||
- `footnote_assistant.py` - Add scholarly footnotes
|
||||
- `fix_dashes.py` - Fix dash usage in text
|
||||
|
||||
|
||||
@@ -1,740 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Generate Alt Text for Images
|
||||
|
||||
This script generates accessibility alt text for images in the Machine Learning Systems book.
|
||||
It works by:
|
||||
1. Building the book locally (using binder)
|
||||
2. Parsing the HTML to find figures with their IDs
|
||||
3. Sending images to a vision model (OpenAI or Ollama)
|
||||
4. Matching figure IDs back to source .qmd files
|
||||
5. Adding fig-alt attributes to the source
|
||||
|
||||
Usage:
|
||||
# Using OpenAI (requires OPENAI_API_KEY)
|
||||
python generate_alt_text.py --chapter intro --provider openai
|
||||
|
||||
# Using Ollama (local, requires llava model)
|
||||
python generate_alt_text.py --chapter intro --provider ollama
|
||||
|
||||
# Dry run (no changes to source files)
|
||||
python generate_alt_text.py --chapter intro --provider ollama --dry-run
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import base64
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Tuple
|
||||
from urllib.parse import urljoin
|
||||
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
from openai import OpenAI
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(levelname)s - %(message)s',
|
||||
handlers=[
|
||||
logging.StreamHandler(),
|
||||
logging.FileHandler("generate_alt_text.log")
|
||||
]
|
||||
)
|
||||
|
||||
# Constants
|
||||
REPO_ROOT = Path(__file__).resolve().parents[3]
|
||||
QUARTO_DIR = REPO_ROOT / "quarto"
|
||||
BUILD_DIR = QUARTO_DIR / "_build" / "html"
|
||||
|
||||
# Prompt template for alt text generation
|
||||
ALT_TEXT_PROMPT = """You are an expert at creating accessible alt text for images in technical textbooks.
|
||||
|
||||
Your task is to write concise, informative alt text for this image from a machine learning systems textbook.
|
||||
|
||||
Guidelines:
|
||||
- Be concise but descriptive (aim for 1-2 sentences, max 125 characters if possible)
|
||||
- Describe what's visually important, not what's obvious from the caption
|
||||
- Focus on the key information the image conveys
|
||||
- For diagrams: describe the structure, flow, and relationships
|
||||
- For graphs: describe the trend, comparison, or key insight
|
||||
- For screenshots: describe the UI element and its purpose
|
||||
- Don't start with "Image of" or "Figure showing"
|
||||
- Don't repeat information from the caption
|
||||
|
||||
Context:
|
||||
Caption: {caption}
|
||||
Section: {section}
|
||||
|
||||
Provide only the alt text, nothing else."""
|
||||
|
||||
|
||||
class OllamaClient:
|
||||
"""Simple wrapper for Ollama API to match OpenAI client interface."""
|
||||
|
||||
def __init__(self, base_url: str = "http://localhost:11434", model: str = "llava"):
|
||||
self.base_url = base_url.rstrip('/')
|
||||
self.model = model
|
||||
|
||||
class ChatCompletions:
|
||||
def __init__(self, parent):
|
||||
self.parent = parent
|
||||
|
||||
def create(self, model: str, messages: List[Dict], **kwargs):
|
||||
"""Create a chat completion using Ollama API with vision support."""
|
||||
# Convert OpenAI format to Ollama format
|
||||
ollama_messages = []
|
||||
|
||||
for msg in messages:
|
||||
if isinstance(msg.get("content"), list):
|
||||
# Handle vision messages with image content
|
||||
text_parts = []
|
||||
images = []
|
||||
|
||||
for item in msg["content"]:
|
||||
if item["type"] == "text":
|
||||
text_parts.append(item["text"])
|
||||
elif item["type"] == "image_url":
|
||||
# Extract base64 image data
|
||||
image_url = item["image_url"]["url"]
|
||||
if image_url.startswith("data:image"):
|
||||
# Extract base64 part
|
||||
base64_data = image_url.split(",", 1)[1]
|
||||
images.append(base64_data)
|
||||
|
||||
ollama_messages.append({
|
||||
"role": msg["role"],
|
||||
"content": " ".join(text_parts),
|
||||
"images": images
|
||||
})
|
||||
else:
|
||||
ollama_messages.append({
|
||||
"role": msg["role"],
|
||||
"content": msg["content"]
|
||||
})
|
||||
|
||||
# Call Ollama API
|
||||
url = f"{self.parent.base_url}/api/chat"
|
||||
payload = {
|
||||
"model": model,
|
||||
"messages": ollama_messages,
|
||||
"stream": False
|
||||
}
|
||||
|
||||
response = requests.post(url, json=payload)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
# Convert to OpenAI-like response format
|
||||
class Choice:
|
||||
def __init__(self, content):
|
||||
self.message = type('obj', (object,), {'content': content})()
|
||||
|
||||
class Response:
|
||||
def __init__(self, content):
|
||||
self.choices = [Choice(content)]
|
||||
|
||||
return Response(result["message"]["content"])
|
||||
else:
|
||||
raise Exception(f"Ollama API error: {response.status_code} - {response.text}")
|
||||
|
||||
@property
|
||||
def chat(self):
|
||||
if not hasattr(self, '_chat'):
|
||||
self._chat = type('obj', (object,), {'completions': self.ChatCompletions(self)})()
|
||||
return self._chat
|
||||
|
||||
|
||||
class FigureInfo:
|
||||
"""Information about a figure extracted from HTML."""
|
||||
|
||||
def __init__(self, figure_id: str, image_path: str, caption: str, section: str = ""):
|
||||
self.figure_id = figure_id
|
||||
self.image_path = image_path
|
||||
self.caption = caption
|
||||
self.section = section
|
||||
self.alt_text = None
|
||||
|
||||
def __repr__(self):
|
||||
return f"FigureInfo(id={self.figure_id}, path={self.image_path})"
|
||||
|
||||
|
||||
def build_chapter(chapter: str) -> bool:
|
||||
"""
|
||||
Build a specific chapter using binder.
|
||||
|
||||
Args:
|
||||
chapter: Chapter name (e.g., 'intro', 'introduction')
|
||||
|
||||
Returns:
|
||||
True if build succeeded, False otherwise
|
||||
"""
|
||||
logging.info(f"Building chapter: {chapter}")
|
||||
|
||||
# Check if binder exists
|
||||
binder_path = REPO_ROOT / "binder"
|
||||
if not binder_path.exists():
|
||||
logging.error("binder script not found")
|
||||
return False
|
||||
|
||||
try:
|
||||
# Run binder html <chapter>
|
||||
result = subprocess.run(
|
||||
["./binder", "html", chapter],
|
||||
cwd=REPO_ROOT,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300 # 5 minute timeout
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
logging.info(f"Successfully built chapter: {chapter}")
|
||||
return True
|
||||
else:
|
||||
logging.error(f"Build failed: {result.stderr}")
|
||||
return False
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
logging.error("Build timed out after 5 minutes")
|
||||
return False
|
||||
except Exception as e:
|
||||
logging.error(f"Build error: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def find_html_file(chapter: str) -> Optional[Path]:
|
||||
"""
|
||||
Find the HTML file for a chapter in the build directory.
|
||||
|
||||
Args:
|
||||
chapter: Chapter name
|
||||
|
||||
Returns:
|
||||
Path to HTML file or None if not found
|
||||
"""
|
||||
# Common chapter name mappings
|
||||
chapter_files = {
|
||||
"intro": "introduction.html",
|
||||
"introduction": "introduction.html",
|
||||
"ml_systems": "ml_systems.html",
|
||||
"dl_primer": "dl_primer.html",
|
||||
# Add more mappings as needed
|
||||
}
|
||||
|
||||
# Try direct mapping
|
||||
if chapter in chapter_files:
|
||||
html_file = BUILD_DIR / "contents" / "core" / chapter_files[chapter]
|
||||
if html_file.exists():
|
||||
return html_file
|
||||
|
||||
# Try searching for files
|
||||
patterns = [
|
||||
BUILD_DIR / "contents" / "core" / f"{chapter}.html",
|
||||
BUILD_DIR / "contents" / "core" / f"*{chapter}*.html",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
matches = list(BUILD_DIR.glob(str(pattern.relative_to(BUILD_DIR))))
|
||||
if matches:
|
||||
return matches[0]
|
||||
|
||||
logging.error(f"Could not find HTML file for chapter: {chapter}")
|
||||
return None
|
||||
|
||||
|
||||
def extract_figures_from_html(html_path: Path) -> List[FigureInfo]:
|
||||
"""
|
||||
Extract figure information from HTML file.
|
||||
|
||||
Args:
|
||||
html_path: Path to HTML file
|
||||
|
||||
Returns:
|
||||
List of FigureInfo objects
|
||||
"""
|
||||
logging.info(f"Extracting figures from: {html_path}")
|
||||
|
||||
with open(html_path, 'r', encoding='utf-8') as f:
|
||||
soup = BeautifulSoup(f.read(), 'html.parser')
|
||||
|
||||
figures = []
|
||||
|
||||
# Find all figure elements
|
||||
for figure in soup.find_all('figure', class_='quarto-float-fig'):
|
||||
# Extract figure ID from aria-describedby or figcaption id
|
||||
figure_id = None
|
||||
aria_desc = figure.find('div', attrs={'aria-describedby': True})
|
||||
if aria_desc:
|
||||
caption_id = aria_desc['aria-describedby']
|
||||
# Extract fig-xxx from fig-xxx-caption-...
|
||||
match = re.match(r'(fig-[^-]+(?:-[^-]+)*)-caption', caption_id)
|
||||
if match:
|
||||
figure_id = match.group(1)
|
||||
|
||||
if not figure_id:
|
||||
logging.warning("Found figure without ID, skipping")
|
||||
continue
|
||||
|
||||
# Extract image path
|
||||
img = figure.find('img')
|
||||
if not img or 'src' not in img.attrs:
|
||||
logging.warning(f"Figure {figure_id} has no image, skipping")
|
||||
continue
|
||||
|
||||
image_path = img['src']
|
||||
|
||||
# Extract caption
|
||||
figcaption = figure.find('figcaption')
|
||||
caption = figcaption.get_text(strip=True) if figcaption else ""
|
||||
|
||||
# Try to find section heading
|
||||
section = ""
|
||||
# Look for nearest preceding heading
|
||||
prev_heading = figure.find_previous(['h1', 'h2', 'h3'])
|
||||
if prev_heading:
|
||||
section = prev_heading.get_text(strip=True)
|
||||
|
||||
fig_info = FigureInfo(figure_id, image_path, caption, section)
|
||||
figures.append(fig_info)
|
||||
logging.info(f"Found figure: {figure_id}")
|
||||
|
||||
logging.info(f"Extracted {len(figures)} figures")
|
||||
return figures
|
||||
|
||||
|
||||
def encode_image(image_path: Path) -> str:
|
||||
"""
|
||||
Encode image to base64 string.
|
||||
|
||||
Args:
|
||||
image_path: Path to image file
|
||||
|
||||
Returns:
|
||||
Base64 encoded string
|
||||
"""
|
||||
with open(image_path, "rb") as f:
|
||||
return base64.b64encode(f.read()).decode('utf-8')
|
||||
|
||||
|
||||
def generate_alt_text_openai(client: OpenAI, figure: FigureInfo, image_path: Path) -> str:
|
||||
"""
|
||||
Generate alt text using OpenAI's vision model.
|
||||
|
||||
Args:
|
||||
client: OpenAI client
|
||||
figure: Figure information
|
||||
image_path: Path to image file
|
||||
|
||||
Returns:
|
||||
Generated alt text
|
||||
"""
|
||||
logging.info(f"Generating alt text for {figure.figure_id} using OpenAI")
|
||||
|
||||
# Encode image
|
||||
base64_image = encode_image(image_path)
|
||||
|
||||
# Prepare prompt
|
||||
prompt = ALT_TEXT_PROMPT.format(
|
||||
caption=figure.caption,
|
||||
section=figure.section
|
||||
)
|
||||
|
||||
# Call OpenAI API
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o", # Updated model that supports vision
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": prompt},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:image/png;base64,{base64_image}"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
max_tokens=300
|
||||
)
|
||||
|
||||
alt_text = response.choices[0].message.content.strip()
|
||||
logging.info(f"Generated alt text: {alt_text}")
|
||||
return alt_text
|
||||
|
||||
|
||||
def generate_alt_text_ollama(client: OllamaClient, figure: FigureInfo, image_path: Path) -> str:
|
||||
"""
|
||||
Generate alt text using Ollama's vision model.
|
||||
|
||||
Args:
|
||||
client: Ollama client
|
||||
figure: Figure information
|
||||
image_path: Path to image file
|
||||
|
||||
Returns:
|
||||
Generated alt text
|
||||
"""
|
||||
logging.info(f"Generating alt text for {figure.figure_id} using Ollama")
|
||||
|
||||
# Encode image
|
||||
base64_image = encode_image(image_path)
|
||||
|
||||
# Prepare prompt
|
||||
prompt = ALT_TEXT_PROMPT.format(
|
||||
caption=figure.caption,
|
||||
section=figure.section
|
||||
)
|
||||
|
||||
# Call Ollama API
|
||||
response = client.chat.completions.create(
|
||||
model=client.model,
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": prompt},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:image/png;base64,{base64_image}"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
)
|
||||
|
||||
alt_text = response.choices[0].message.content.strip()
|
||||
logging.info(f"Generated alt text: {alt_text}")
|
||||
return alt_text
|
||||
|
||||
|
||||
def find_qmd_files(chapter: str) -> List[Path]:
|
||||
"""
|
||||
Find .qmd files for a chapter.
|
||||
|
||||
Args:
|
||||
chapter: Chapter name
|
||||
|
||||
Returns:
|
||||
List of .qmd file paths
|
||||
"""
|
||||
# Search in contents/core
|
||||
core_dir = QUARTO_DIR / "contents" / "core"
|
||||
|
||||
# Try to find directory matching chapter name
|
||||
chapter_patterns = [
|
||||
chapter,
|
||||
f"*{chapter}*",
|
||||
]
|
||||
|
||||
qmd_files = []
|
||||
for pattern in chapter_patterns:
|
||||
matches = list(core_dir.glob(f"{pattern}/**/*.qmd"))
|
||||
qmd_files.extend(matches)
|
||||
|
||||
# Also try direct files
|
||||
direct_matches = list(core_dir.glob(f"{pattern}.qmd"))
|
||||
qmd_files.extend(direct_matches)
|
||||
|
||||
# Remove duplicates
|
||||
qmd_files = list(set(qmd_files))
|
||||
|
||||
logging.info(f"Found {len(qmd_files)} .qmd files for chapter {chapter}")
|
||||
return qmd_files
|
||||
|
||||
|
||||
def find_figure_in_qmd(figure_id: str, qmd_path: Path) -> Optional[Tuple[int, str]]:
|
||||
"""
|
||||
Find a figure reference in a .qmd file.
|
||||
|
||||
Args:
|
||||
figure_id: Figure ID to find (e.g., 'fig-ai-timeline')
|
||||
qmd_path: Path to .qmd file
|
||||
|
||||
Returns:
|
||||
Tuple of (line_number, line_content) if found, None otherwise
|
||||
"""
|
||||
with open(qmd_path, 'r', encoding='utf-8') as f:
|
||||
lines = f.readlines()
|
||||
|
||||
# Look for figure ID in various forms
|
||||
patterns = [
|
||||
rf'\{{#{figure_id}\}}', # {#fig-xxx}
|
||||
rf'#\| label: {figure_id}', # #| label: fig-xxx
|
||||
rf'id="{figure_id}"', # id="fig-xxx"
|
||||
]
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
for pattern in patterns:
|
||||
if re.search(pattern, line):
|
||||
return (i, line)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def add_alt_text_to_qmd(figure_id: str, alt_text: str, qmd_path: Path, dry_run: bool = False) -> bool:
|
||||
"""
|
||||
Add or update fig-alt attribute in a .qmd file.
|
||||
|
||||
Args:
|
||||
figure_id: Figure ID
|
||||
alt_text: Alt text to add
|
||||
qmd_path: Path to .qmd file
|
||||
dry_run: If True, don't actually modify the file
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
logging.info(f"Adding alt text to {qmd_path} for {figure_id}")
|
||||
|
||||
with open(qmd_path, 'r', encoding='utf-8') as f:
|
||||
lines = f.readlines()
|
||||
|
||||
# Find the figure
|
||||
result = find_figure_in_qmd(figure_id, qmd_path)
|
||||
if not result:
|
||||
logging.warning(f"Could not find {figure_id} in {qmd_path}")
|
||||
return False
|
||||
|
||||
line_num, line_content = result
|
||||
|
||||
# Check if fig-alt already exists
|
||||
if 'fig-alt' in line_content:
|
||||
logging.info(f"Figure {figure_id} already has fig-alt, updating")
|
||||
# Update existing fig-alt
|
||||
updated_line = re.sub(
|
||||
r'fig-alt="[^"]*"',
|
||||
f'fig-alt="{alt_text}"',
|
||||
line_content
|
||||
)
|
||||
else:
|
||||
# Add fig-alt attribute
|
||||
# Try to add it to the same line if it's an inline figure
|
||||
if '{#' in line_content:
|
||||
# Inline figure: {#fig-xxx}
|
||||
updated_line = line_content.replace(
|
||||
f'{{#{figure_id}}}',
|
||||
f'{{#{figure_id} fig-alt="{alt_text}"}}'
|
||||
)
|
||||
else:
|
||||
# Block figure with #| label:
|
||||
# Add fig-alt on the next line
|
||||
lines.insert(line_num + 1, f'#| fig-alt: "{alt_text}"\n')
|
||||
updated_line = None
|
||||
|
||||
if updated_line:
|
||||
lines[line_num] = updated_line
|
||||
|
||||
if dry_run:
|
||||
logging.info(f"[DRY RUN] Would update {qmd_path}")
|
||||
logging.info(f"[DRY RUN] Line {line_num}: {line_content.strip()}")
|
||||
if updated_line:
|
||||
logging.info(f"[DRY RUN] New line: {updated_line.strip()}")
|
||||
return True
|
||||
|
||||
# Write back to file
|
||||
with open(qmd_path, 'w', encoding='utf-8') as f:
|
||||
f.writelines(lines)
|
||||
|
||||
logging.info(f"Successfully updated {qmd_path}")
|
||||
return True
|
||||
|
||||
|
||||
def process_chapter(chapter: str, provider: str, model: str, dry_run: bool = False) -> Dict:
|
||||
"""
|
||||
Process a chapter to generate and add alt text.
|
||||
|
||||
Args:
|
||||
chapter: Chapter name
|
||||
provider: 'openai' or 'ollama'
|
||||
model: Model name
|
||||
dry_run: If True, don't modify files
|
||||
|
||||
Returns:
|
||||
Dictionary with processing statistics
|
||||
"""
|
||||
stats = {
|
||||
"total_figures": 0,
|
||||
"alt_text_generated": 0,
|
||||
"files_updated": 0,
|
||||
"errors": 0
|
||||
}
|
||||
|
||||
# Step 1: Build the chapter
|
||||
logging.info(f"=== Processing chapter: {chapter} ===")
|
||||
if not build_chapter(chapter):
|
||||
logging.error("Build failed, aborting")
|
||||
return stats
|
||||
|
||||
# Step 2: Find the HTML file
|
||||
html_path = find_html_file(chapter)
|
||||
if not html_path:
|
||||
logging.error("Could not find HTML file, aborting")
|
||||
return stats
|
||||
|
||||
# Step 3: Extract figures
|
||||
figures = extract_figures_from_html(html_path)
|
||||
stats["total_figures"] = len(figures)
|
||||
|
||||
if not figures:
|
||||
logging.info("No figures found in this chapter")
|
||||
return stats
|
||||
|
||||
# Step 4: Initialize client
|
||||
if provider == "openai":
|
||||
api_key = os.getenv("OPENAI_API_KEY")
|
||||
if not api_key:
|
||||
logging.error("OPENAI_API_KEY not set")
|
||||
return stats
|
||||
client = OpenAI(api_key=api_key)
|
||||
else: # ollama
|
||||
client = OllamaClient(model=model)
|
||||
|
||||
# Step 5: Generate alt text for each figure
|
||||
for figure in figures:
|
||||
try:
|
||||
# Find the actual image file
|
||||
image_rel_path = figure.image_path
|
||||
# Remove leading / if present
|
||||
if image_rel_path.startswith('/'):
|
||||
image_rel_path = image_rel_path[1:]
|
||||
|
||||
# Try different possible locations
|
||||
possible_paths = [
|
||||
html_path.parent / image_rel_path,
|
||||
BUILD_DIR / image_rel_path,
|
||||
html_path.parent / Path(image_rel_path).name,
|
||||
]
|
||||
|
||||
image_path = None
|
||||
for path in possible_paths:
|
||||
if path.exists():
|
||||
image_path = path
|
||||
break
|
||||
|
||||
if not image_path:
|
||||
logging.warning(f"Could not find image file: {figure.image_path}")
|
||||
stats["errors"] += 1
|
||||
continue
|
||||
|
||||
# Generate alt text
|
||||
if provider == "openai":
|
||||
alt_text = generate_alt_text_openai(client, figure, image_path)
|
||||
else:
|
||||
alt_text = generate_alt_text_ollama(client, figure, image_path)
|
||||
|
||||
figure.alt_text = alt_text
|
||||
stats["alt_text_generated"] += 1
|
||||
|
||||
except Exception as e:
|
||||
logging.error(f"Error generating alt text for {figure.figure_id}: {e}")
|
||||
stats["errors"] += 1
|
||||
continue
|
||||
|
||||
# Step 6: Find .qmd files
|
||||
qmd_files = find_qmd_files(chapter)
|
||||
|
||||
# Step 7: Update .qmd files
|
||||
for figure in figures:
|
||||
if not figure.alt_text:
|
||||
continue
|
||||
|
||||
# Try to find which .qmd file contains this figure
|
||||
found = False
|
||||
for qmd_path in qmd_files:
|
||||
if find_figure_in_qmd(figure.figure_id, qmd_path):
|
||||
if add_alt_text_to_qmd(figure.figure_id, figure.alt_text, qmd_path, dry_run):
|
||||
stats["files_updated"] += 1
|
||||
found = True
|
||||
break
|
||||
|
||||
if not found:
|
||||
logging.warning(f"Could not find {figure.figure_id} in any .qmd file")
|
||||
stats["errors"] += 1
|
||||
|
||||
return stats
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate alt text for images in ML Systems book",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=__doc__
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--chapter", "-c",
|
||||
required=True,
|
||||
help="Chapter to process (e.g., 'intro', 'ml_systems')"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--provider",
|
||||
default="ollama",
|
||||
choices=["openai", "ollama"],
|
||||
help="AI provider to use (default: ollama)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--model",
|
||||
help="Model to use (default: gpt-4o for OpenAI, llava for Ollama)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--ollama-url",
|
||||
default="http://localhost:11434",
|
||||
help="Ollama API URL (default: http://localhost:11434)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--dry-run",
|
||||
action="store_true",
|
||||
help="Don't modify files, just show what would be done"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set default model
|
||||
if not args.model:
|
||||
if args.provider == "openai":
|
||||
args.model = "gpt-4o"
|
||||
else:
|
||||
args.model = "llava"
|
||||
|
||||
logging.info(f"Starting alt text generation for chapter: {args.chapter}")
|
||||
logging.info(f"Provider: {args.provider}, Model: {args.model}")
|
||||
if args.dry_run:
|
||||
logging.info("DRY RUN MODE - No files will be modified")
|
||||
|
||||
# Process the chapter
|
||||
stats = process_chapter(args.chapter, args.provider, args.model, args.dry_run)
|
||||
|
||||
# Print summary
|
||||
print("\n" + "="*60)
|
||||
print("ALT TEXT GENERATION SUMMARY")
|
||||
print("="*60)
|
||||
print(f"Total figures found: {stats['total_figures']}")
|
||||
print(f"Alt text generated: {stats['alt_text_generated']}")
|
||||
print(f"Files updated: {stats['files_updated']}")
|
||||
print(f"Errors: {stats['errors']}")
|
||||
print("="*60)
|
||||
|
||||
if args.dry_run:
|
||||
print("\n⚠️ DRY RUN MODE - No files were actually modified")
|
||||
else:
|
||||
print(f"\n✅ Successfully processed chapter: {args.chapter}")
|
||||
print(f"📝 Check the log file for details: generate_alt_text.log")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user