Merge branch 'feature/enforce-snake-case' into dev

This commit is contained in:
Vijay Janapa Reddi
2025-08-11 12:29:23 -04:00
25 changed files with 118 additions and 458 deletions

View File

@@ -54,7 +54,7 @@ repos:
# --- Content Formatting ---
- id: collapse-extra-blank-lines
name: "Collapse extra blank lines"
entry: python tools/scripts/content/collapse_blank_lines.py
entry: python tools/scripts/content/format_blank_lines.py
language: python
pass_filenames: true
files: ^quarto/contents/.*\.qmd$
@@ -127,7 +127,7 @@ repos:
# --- Structural & Reference Validation ---
- id: check-unreferenced-labels
name: "Check for unreferenced labels"
entry: python ./tools/scripts/content/find_unreferenced_labels.py ./quarto/contents/core
entry: python ./tools/scripts/content/check_unreferenced_labels.py ./quarto/contents/core
language: python
additional_dependencies: []
pass_filenames: false
@@ -135,7 +135,7 @@ repos:
- id: check-duplicate-labels
name: "Check for duplicate labels"
entry: python3 tools/scripts/content/find_duplicate_labels.py
entry: python3 tools/scripts/content/check_duplicate_labels.py
args: ['-d', 'quarto/contents/', '--figures', '--tables', '--listings', '--quiet', '--strict']
language: system
pass_filenames: false
@@ -175,7 +175,7 @@ repos:
# --- Image Validation ---
- id: validate-images
name: "Validate image files"
entry: python tools/scripts/utilities/check_images.py
entry: python tools/scripts/utilities/manage_images.py
language: python
additional_dependencies:
- pillow
@@ -185,7 +185,7 @@ repos:
- id: validate-external-images
name: "Check for external images in Quarto files"
entry: python3 tools/scripts/download_external_images.py --validate quarto/contents/
entry: python3 tools/scripts/manage_external_images.py --validate quarto/contents/
language: system
pass_filenames: false
files: ^quarto/contents/.*\.qmd$

View File

@@ -1 +1 @@
config/_quarto-html.yml
config/_quarto-pdf.yml

View File

@@ -1353,7 +1353,7 @@ To build effective machine learning systems, we must first understand how differ
@fig-labels illustrates the common label types:
![**Data Annotation Granularity**: Increasing levels of detail in data labeling—from bounding boxes to pixel-level segmentation—impact both annotation cost and potential model accuracy. Fine-grained segmentation provides richer information for training but demands significantly more labeling effort and storage capacity than coarser annotations.](images/png/CS249r_Labels_new.png){#fig-labels width=90%}
![**Data Annotation Granularity**: Increasing levels of detail in data labeling—from bounding boxes to pixel-level segmentation—impact both annotation cost and potential model accuracy. Fine-grained segmentation provides richer information for training but demands significantly more labeling effort and storage capacity than coarser annotations.](images/png/cs249r_labels_new.png){#fig-labels width=90%}
The choice of label format depends heavily on our system requirements and resource constraints [@10.1109/ICRA.2017.7989092]. While classification labels might suffice for simple traffic counting, autonomous vehicles need detailed segmentation maps to make precise navigation decisions. Leading autonomous vehicle companies often maintain hybrid systems that store multiple label types for the same data, allowing flexible use across different applications.

View File

@@ -644,15 +644,15 @@ A useful example of this attack technique can be seen in a power analysis of a p
@fig-encryption shows the device's behavior when the correct password is entered. The red waveform captures the serial data stream, marking each byte as it is received. The blue curve records the device's power consumption over time. When the full, correct password is supplied, the power profile remains stable and consistent across all five bytes, providing a clear baseline for comparison with failed attempts.
![**Power Profile**: The device's power consumption remains stable during authentication when the correct password is entered, setting a baseline for comparison in subsequent figures through This figure. Source: colin o'flynn.](images/png/Power_analysis_of_an_encryption_device_with_a_correct_password.png){#fig-encryption}
![**Power Profile**: The device's power consumption remains stable during authentication when the correct password is entered, setting a baseline for comparison in subsequent figures through This figure. Source: colin o'flynn.](images/png/power_analysis_of_an_encryption_device_with_a_correct_password.png){#fig-encryption}
When an incorrect password is entered, the power analysis chart changes as shown in @fig-encryption2. In this case, the first three bytes (`0x61, 0x52, 0x77`) are correct, so the power patterns closely match the correct password up to that point. However, when the fourth byte (`0x42`) is processed and found to be incorrect, the device halts authentication. This change is reflected in the sudden jump in the blue power line, indicating that the device has stopped processing and entered an error state.
![**Side-Channel Attack Vulnerability**: Power consumption patterns reveal cryptographic key information during authentication; consistent power usage indicates correct password bytes, while abrupt changes signal incorrect input and halted processing. Even without knowing the password, an attacker can infer it by analyzing the device's power usage during authentication attempts via this figure. Source: Colin O'Flynn.](images/png/Power_analysis_of_an_encryption_device_with_a_partially_wrong_password.png){#fig-encryption2}
![**Side-Channel Attack Vulnerability**: Power consumption patterns reveal cryptographic key information during authentication; consistent power usage indicates correct password bytes, while abrupt changes signal incorrect input and halted processing. Even without knowing the password, an attacker can infer it by analyzing the device's power usage during authentication attempts via this figure. Source: Colin O'Flynn.](images/png/power_analysis_of_an_encryption_device_with_a_partially_wrong_password.png){#fig-encryption2}
@fig-encryption3 shows the case where the password is entirely incorrect (`0x30, 0x30, 0x30, 0x30, 0x30`). Here, the device detects the mismatch immediately after the first byte and halts processing much earlier. This is again visible in the power profile, where the blue line exhibits a sharp jump following the first byte, reflecting the device's early termination of authentication.
![**Power Consumption Jump**: The blue line's sharp increase after processing the first byte indicates immediate authentication failure, highlighting how incorrect passwords are quickly detected through power usage. Source: Colin O'Flynn.](images/png/Power_analysis_of_an_encryption_device_with_a_wrong_password.png){#fig-encryption3}
![**Power Consumption Jump**: The blue line's sharp increase after processing the first byte indicates immediate authentication failure, highlighting how incorrect passwords are quickly detected through power usage. Source: Colin O'Flynn.](images/png/power_analysis_of_an_encryption_device_with_a_wrong_password.png){#fig-encryption3}
These examples demonstrate how attackers can exploit observable power consumption differences to reduce the search space and eventually recover secret data through brute-force analysis. By systematically measuring power consumption patterns and correlating them with different inputs, attackers can extract sensitive information that should remain hidden.

View File

@@ -16,11 +16,11 @@ We are grateful for the academic support that has made it possible to hire teach
::: {layout-nrow=2}
![](images/png/HDSI.png)
![](images/png/hdsi.png)
![](images/png/harvard-xtension-school.png)
![](images/png/NSF.png)
![](images/png/nsf.png)
:::
@@ -30,7 +30,7 @@ We gratefully acknowledge the support of the following non-profit organizations
::: {layout-nrow=1}
![](images/png/EDGEAI.png)
![](images/png/edgeai.png)
![](images/png/ictp.png)
@@ -42,7 +42,7 @@ The following companies contributed hardware kits used for the labs in this book
::: {layout-nrow=1}
![](images/png/Arduino.png)
![](images/png/arduino.png)
![](images/png/google.png)

View File

@@ -43,7 +43,7 @@ In this KWS project, we will focus on Stage 1 (KWS or Keyword Spotting), where w
The diagram below gives an idea of how the final KWS application should work (during inference):
\noindent
![](images/jpg/KWS_PROJ_INF_BLK.jpg)
![](images/jpg/kws_proj_inf_blk.jpg)
Our KWS application will recognize four classes of sound:
@@ -59,7 +59,7 @@ Our KWS application will recognize four classes of sound:
The main component of the KWS application is its model. So, we must train such a model with our specific keywords, noise, and other words (the "unknown"):
\noindent
![](images/jpg/KWS_PROJ_TRAIN_BLK.jpg)
![](images/jpg/kws_proj_train_blk.jpg)
## Dataset {#sec-keyword-spotting-kws-dataset-7279}
@@ -168,12 +168,12 @@ The following step is to create the features to be trained in the next phase:
We could keep the default parameter values, but we will use the DSP `Autotune parameters` option.
\noindent
![](images/jpg/ei_MFCC.jpg)
![](images/jpg/ei_mfcc.jpg)
We will take the `Raw features` (our 1-second, 16 KHz sampled audio data) and use the MFCC processing block to calculate the `Processed features`. For every 16,000 raw features (16,000 $\times$ 1 second), we will get 637 processed features $(13\times 49)$.
\noindent
![](images/jpg/MFCC.jpg){width="90%" fig-align="center"}
![](images/jpg/mfcc.jpg){width="90%" fig-align="center"}
The result shows that we only used a small amount of memory to pre-process data (16 KB) and a latency of 34 ms, which is excellent. For example, on an Arduino Nano (Cortex-M4f \@ 64 MHz), the same pre-process will take around 480 ms. The parameters chosen, such as the `FFT length` \[512\], will significantly impact the latency.

View File

@@ -282,7 +282,7 @@ So, for an FFT length of 32 points, the resulting output of the Spectral Analysi
Once we understand what the pre-processing does, it is time to finish the job. So, let's take the raw data (time-series type) and convert it to tabular data. For that, go to the `Spectral Features` section on the `Parameters` tab, define the main parameters as discussed in the previous section (`[FFT]` with `[32]` points), and select`[Save Parameters]`:
\noindent
![](images/jpg/Parameters_definition.jpg){width="85%" fig-align="center"}
![](images/jpg/parameters_definition.jpg){width="85%" fig-align="center"}
At the top menu, select the `Generate Features` option and the `Generate Features` button. Each 2-second window data will be converted into one data point of 63 features.

View File

@@ -6,7 +6,7 @@ The selected platforms are widely used in commercial applications, thereby ensur
## Our Featured Platform
![Complete XIAOML Kit with all components](seeed/xiao_esp32s3/images/png/XIAOML_Kit_Complete.png){fig-align="center"}
![Complete XIAOML Kit with all components](seeed/xiao_esp32s3/images/png/xiaoml_kit_complete.png){fig-align="center"}
The [XIAOML Kit](https://www.seeedstudio.com/blog/2025/08/05/introducing-the-xiaoml-kit-your-tinyml-journey-starts-here/) is the most recent addition to our educational hardware platforms (released on July 31st, 2025). It offers a comprehensive TinyML development environment for learning about ML systems, featuring integrated wireless connectivity, a camera, multiple sensors, and extensive documentation. This compact board exemplifies how contemporary embedded systems can efficiently provide advanced machine learning capabilities within a cost-effective framework.
@@ -118,7 +118,7 @@ The XIAOML Kit excels at wireless connectivity and cost-sensitive deployments. I
The XIAO ESP32S3 represents the category of ultra-compact, wireless-enabled microcontrollers optimized for IoT applications. The name "XIAO" (小) translates to "tiny" in Chinese, reflecting the board's 21×17.5mm form factor.
![XIAO ESP32S3 development board](seeed/xiao_esp32s3/images/png/XIAOML_Kit_Complete.png){width=400}
![XIAO ESP32S3 development board](seeed/xiao_esp32s3/images/png/xiaoml_kit_complete.png){width=400}
**Processor Architecture:**
ESP32-S3 dual-core Xtensa LX7 running at 240MHz

View File

@@ -278,7 +278,7 @@ Close the Upload Data window and return to the **Data acquisition** page. We can
Classifying images is the most common application of deep learning, but a substantial amount of data is required to accomplish this task. We have around 50 images for each category. Is this number enough? Not at all! We will need thousands of images to "teach" or "model" each class, allowing us to differentiate them. However, we can resolve this issue by retraining a previously trained model using thousands of images. We refer to this technique as **"Transfer Learning" (TL)**. With TL, we can fine-tune a pre-trained image classification model on our data, achieving good performance even with relatively small image datasets, as in our case.
\noindent
![](./images/png/TL.png){width=85% fig-align="center"}
![](./images/png/tl.png){width=85% fig-align="center"}
With TL, we can fine-tune a pre-trained image classification model on our data, performing well even with relatively small image datasets (our case).

View File

@@ -1,368 +0,0 @@
#!/usr/bin/env python3
"""
GitHub Workflow Runs Cleanup Script
This script helps clean up old GitHub workflow runs while keeping a configurable
number of recent runs per workflow. Useful for cleaning up failed debugging runs.
Usage:
python cleanup_workflow_runs.py --help
python cleanup_workflow_runs.py --dry-run
python cleanup_workflow_runs.py --keep 5 --token YOUR_TOKEN
python cleanup_workflow_runs.py --keep 10 --workflow "quarto-build.yml"
Requirements:
- GitHub personal access token with 'actions:write' scope
- Set token via --token flag or GITHUB_TOKEN environment variable
"""
import argparse
import json
import os
import sys
import time
from datetime import datetime
from typing import Dict, List, Optional, Tuple
import requests
class GitHubWorkflowCleaner:
"""Manages cleanup of GitHub workflow runs."""
def __init__(self, token: str, repo: str, dry_run: bool = False):
"""
Initialize the workflow cleaner.
Args:
token: GitHub personal access token
repo: Repository in format 'owner/repo'
dry_run: If True, only preview actions without executing
"""
self.token = token
self.repo = repo
self.dry_run = dry_run
self.base_url = "https://api.github.com"
self.headers = {
"Authorization": f"token {token}",
"Accept": "application/vnd.github.v3+json",
"User-Agent": "MLSysBook-Workflow-Cleaner"
}
def _make_request(self, method: str, url: str, **kwargs) -> Optional[requests.Response]:
"""Make a GitHub API request with error handling."""
try:
response = requests.request(method, url, headers=self.headers, **kwargs)
if response.status_code == 403:
# Check for rate limiting
if 'X-RateLimit-Remaining' in response.headers:
remaining = int(response.headers['X-RateLimit-Remaining'])
if remaining == 0:
reset_time = int(response.headers['X-RateLimit-Reset'])
wait_time = reset_time - int(time.time()) + 1
print(f"⚠️ Rate limit exceeded. Waiting {wait_time} seconds...")
time.sleep(wait_time)
return self._make_request(method, url, **kwargs)
print(f"❌ Permission denied. Check your token has 'actions:write' scope.")
return None
elif response.status_code == 404:
print(f"❌ Repository not found: {self.repo}")
return None
elif not response.ok:
print(f"❌ API request failed: {response.status_code} - {response.text}")
return None
return response
except requests.exceptions.RequestException as e:
print(f"❌ Request failed: {e}")
return None
def get_workflows(self) -> List[Dict]:
"""Get all workflows in the repository."""
url = f"{self.base_url}/repos/{self.repo}/actions/workflows"
response = self._make_request("GET", url)
if not response:
return []
workflows = response.json().get('workflows', [])
print(f"📋 Found {len(workflows)} workflows")
return workflows
def get_workflow_runs(self, workflow_id: str, per_page: int = 100) -> List[Dict]:
"""Get all runs for a specific workflow."""
all_runs = []
page = 1
while True:
url = f"{self.base_url}/repos/{self.repo}/actions/workflows/{workflow_id}/runs"
params = {
"per_page": per_page,
"page": page
}
response = self._make_request("GET", url, params=params)
if not response:
break
data = response.json()
runs = data.get('workflow_runs', [])
if not runs:
break
all_runs.extend(runs)
# Check if we have more pages
if len(runs) < per_page:
break
page += 1
return all_runs
def delete_workflow_run(self, run_id: str) -> bool:
"""Delete a specific workflow run."""
if self.dry_run:
return True
url = f"{self.base_url}/repos/{self.repo}/actions/runs/{run_id}"
response = self._make_request("DELETE", url)
return response is not None and response.status_code == 204
def clean_workflow_runs(self, keep_count: int = 5, workflow_filter: Optional[str] = None) -> Tuple[int, int]:
"""
Clean up old workflow runs.
Args:
keep_count: Number of recent runs to keep per workflow
workflow_filter: Optional workflow name to filter (e.g., 'quarto-build.yml')
Returns:
Tuple of (total_runs_found, runs_to_delete)
"""
workflows = self.get_workflows()
if not workflows:
return 0, 0
total_runs = 0
total_to_delete = 0
for workflow in workflows:
workflow_name = workflow['name']
workflow_path = workflow['path'].split('/')[-1] # Get filename
workflow_id = workflow['id']
# Apply filter if specified
if workflow_filter and workflow_filter not in [workflow_name, workflow_path]:
continue
print(f"\n🔍 Processing workflow: {workflow_name} ({workflow_path})")
# Get all runs for this workflow
runs = self.get_workflow_runs(workflow_id)
total_runs += len(runs)
if len(runs) <= keep_count:
print(f" ✅ Only {len(runs)} runs found, keeping all")
continue
# Sort runs by creation date (newest first)
runs.sort(key=lambda x: x['created_at'], reverse=True)
# Identify runs to delete (everything after keep_count)
runs_to_keep = runs[:keep_count]
runs_to_delete = runs[keep_count:]
print(f" 📊 Total runs: {len(runs)}")
print(f" 📌 Keeping: {len(runs_to_keep)} most recent")
print(f" 🗑️ To delete: {len(runs_to_delete)}")
if self.dry_run:
print(f" 🔍 DRY RUN: Would delete {len(runs_to_delete)} runs")
total_to_delete += len(runs_to_delete)
continue
# Delete old runs
deleted_count = 0
for run in runs_to_delete:
run_id = run['id']
run_number = run['run_number']
status = run['status']
conclusion = run['conclusion']
created_at = run['created_at']
print(f" 🗑️ Deleting run #{run_number} ({status}/{conclusion}) from {created_at}")
if self.delete_workflow_run(run_id):
deleted_count += 1
# Small delay to avoid overwhelming the API
time.sleep(0.5)
else:
print(f" ❌ Failed to delete run #{run_number}")
total_to_delete += deleted_count
print(f" ✅ Successfully deleted {deleted_count}/{len(runs_to_delete)} runs")
return total_runs, total_to_delete
def show_workflow_summary(self):
"""Show a summary of all workflows and their run counts."""
workflows = self.get_workflows()
if not workflows:
return
print(f"\n📊 Workflow Summary for {self.repo}")
print("=" * 60)
total_runs = 0
for workflow in workflows:
workflow_name = workflow['name']
workflow_path = workflow['path'].split('/')[-1]
workflow_id = workflow['id']
runs = self.get_workflow_runs(workflow_id)
run_count = len(runs)
total_runs += run_count
# Count by status
statuses = {}
for run in runs:
status = f"{run['status']}/{run.get('conclusion', 'N/A')}"
statuses[status] = statuses.get(status, 0) + 1
print(f"{workflow_name} ({workflow_path}): {run_count} runs")
for status, count in sorted(statuses.items()):
print(f" - {status}: {count}")
print(f"\n📈 Total workflow runs across all workflows: {total_runs}")
def get_repo_from_git() -> Optional[str]:
"""Try to determine repository from git remote."""
try:
import subprocess
result = subprocess.run(
['git', 'remote', 'get-url', 'origin'],
capture_output=True,
text=True,
check=True
)
remote_url = result.stdout.strip()
# Parse GitHub URL
if 'github.com' in remote_url:
if remote_url.startswith('git@github.com:'):
repo = remote_url.replace('git@github.com:', '').replace('.git', '')
elif remote_url.startswith('https://github.com/'):
repo = remote_url.replace('https://github.com/', '').replace('.git', '')
else:
return None
return repo
except:
return None
return None
def main():
parser = argparse.ArgumentParser(
description="Clean up old GitHub workflow runs",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Show summary of all workflow runs
python cleanup_workflow_runs.py --summary
# Dry run - see what would be deleted
python cleanup_workflow_runs.py --dry-run --keep 5
# Clean up, keeping 10 most recent runs per workflow
python cleanup_workflow_runs.py --keep 10
# Clean up specific workflow only
python cleanup_workflow_runs.py --workflow "quarto-build.yml" --keep 3
Environment Variables:
GITHUB_TOKEN - GitHub personal access token (alternative to --token)
"""
)
parser.add_argument(
'--token',
help='GitHub personal access token (or set GITHUB_TOKEN env var)'
)
parser.add_argument(
'--repo',
help='Repository in format owner/repo (auto-detected from git if not provided)'
)
parser.add_argument(
'--keep',
type=int,
default=5,
help='Number of recent workflow runs to keep per workflow (default: 5)'
)
parser.add_argument(
'--workflow',
help='Clean specific workflow only (by name or filename)'
)
parser.add_argument(
'--dry-run',
action='store_true',
help='Preview what would be deleted without actually deleting'
)
parser.add_argument(
'--summary',
action='store_true',
help='Show summary of workflow runs and exit'
)
args = parser.parse_args()
# Get GitHub token
token = args.token or os.getenv('GITHUB_TOKEN')
if not token:
print("❌ GitHub token required. Use --token flag or set GITHUB_TOKEN environment variable")
print(" Generate token at: https://github.com/settings/tokens")
print(" Required scopes: actions:write, repo")
sys.exit(1)
# Get repository
repo = args.repo or get_repo_from_git()
if not repo:
print("❌ Repository not specified and could not auto-detect from git")
print(" Use --repo owner/repo or run from a git repository")
sys.exit(1)
print(f"🚀 GitHub Workflow Cleanup for {repo}")
print(f" Token: {'*' * (len(token) - 4)}{token[-4:]}")
print(f" Mode: {'DRY RUN' if args.dry_run else 'LIVE'}")
# Initialize cleaner
cleaner = GitHubWorkflowCleaner(token, repo, args.dry_run)
if args.summary:
cleaner.show_workflow_summary()
return
# Clean workflow runs
print(f"\n🧹 Starting cleanup (keeping {args.keep} runs per workflow)")
if args.workflow:
print(f" Filtering to workflow: {args.workflow}")
total_runs, deleted_runs = cleaner.clean_workflow_runs(
keep_count=args.keep,
workflow_filter=args.workflow
)
print(f"\n📊 Cleanup Summary")
print("=" * 40)
print(f"Total workflow runs found: {total_runs}")
if args.dry_run:
print(f"Runs that would be deleted: {deleted_runs}")
print("\n💡 Run without --dry-run to actually delete the runs")
else:
print(f"Runs successfully deleted: {deleted_runs}")
print("✅ Cleanup completed!")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,96 @@
#!/usr/bin/env python3
"""
Fixes image reference case mismatches in Quarto markdown files.
This script reads the output of the `validate-image-references` pre-commit hook,
extracts the case mismatch errors, and corrects the references in the .qmd files.
"""
import argparse
import re
from pathlib import Path
import sys
def parse_pre_commit_output(output_file: Path) -> list[tuple[str, str]]:
"""
Parses the pre-commit output to find case mismatch errors.
Args:
output_file: Path to the file containing the pre-commit output.
Returns:
A list of tuples, where each tuple contains the incorrect and correct filenames.
"""
mismatches = []
with open(output_file, 'r') as f:
content = f.read()
pattern = r"Case mismatch: expected '([^']*)' but found '([^']*)'"
matches = re.findall(pattern, content)
for expected, found in matches:
mismatches.append((expected, found))
return mismatches
def find_qmd_files(base_dir: Path) -> list[Path]:
"""Finds all .qmd files in the base directory."""
return list(base_dir.rglob("*.qmd"))
def fix_references_in_file(qmd_file: Path, mismatches: list[tuple[str, str]]):
"""
Fixes the image references in a single .qmd file.
Args:
qmd_file: Path to the .qmd file to fix.
mismatches: A list of tuples with incorrect and correct filenames.
"""
try:
with open(qmd_file, 'r', encoding='utf-8') as f:
content = f.read()
except FileNotFoundError:
return # Skip if file not found
original_content = content
for incorrect, correct in mismatches:
content = content.replace(incorrect, correct)
if content != original_content:
with open(qmd_file, 'w', encoding='utf-8') as f:
f.write(content)
print(f"✅ Corrected references in {qmd_file}")
def main():
parser = argparse.ArgumentParser(
description="Fix image reference case mismatches in Quarto files."
)
parser.add_argument(
"input_file",
type=Path,
help="Path to the file containing the pre-commit output."
)
parser.add_argument(
"-d", "--directory",
type=Path,
default=Path("quarto/contents"),
help="The directory to search for .qmd files."
)
args = parser.parse_args()
if not args.input_file.exists():
print(f"❌ Error: Input file not found at {args.input_file}")
sys.exit(1)
mismatches = parse_pre_commit_output(args.input_file)
if not mismatches:
print("No case mismatches found in the input file.")
return
qmd_files = find_qmd_files(args.directory)
for qmd_file in qmd_files:
fix_references_in_file(qmd_file, mismatches)
print("\nDone.")
if __name__ == "__main__":
main()

View File

@@ -1,68 +0,0 @@
import re
import sys
import os
def update_callouts(text):
callout_with_title_pattern = re.compile(r':::\{.*?title=".*?".*?\}')
callout_without_title_pattern = re.compile(
r'(:::\{(?P<class_id>[^\n]+?)\})\n\n(?!<!--)(?P<header>#{1,6} (?P<title>[^\n]+))\n',
re.MULTILINE
)
def replacer(match):
class_id = match.group('class_id').strip()
title = match.group('title')
updated_callout = f":::{{{class_id} title=\"{title}\"}}\n"
return updated_callout
text_without_titled_callouts = callout_with_title_pattern.sub(lambda m: m.group(0), text)
updated_text = callout_without_title_pattern.sub(replacer, text_without_titled_callouts)
return updated_text
def process_file(filepath):
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
updated_content = update_callouts(content)
if content != updated_content:
with open(filepath, 'w', encoding='utf-8') as f:
f.write(updated_content)
print(f"Updated: {filepath}")
else:
print(f"No changes: {filepath}")
def process_directory(directory):
for root, _, files in os.walk(directory):
for file in files:
if file.endswith(".qmd"):
filepath = os.path.join(root, file)
process_file(filepath)
def print_usage():
print("Usage:")
print(" python3 fixtitle.py -d <directory> # Process all .qmd files in the directory recursively")
print(" python3 fixtitle.py -f <file> # Process a single .qmd file")
sys.exit(1)
if __name__ == "__main__":
if len(sys.argv) != 3:
print_usage()
option, path = sys.argv[1], sys.argv[2]
if option == "-d":
if not os.path.isdir(path):
print(f"Error: '{path}' is not a valid directory.")
sys.exit(1)
process_directory(path)
elif option == "-f":
if not os.path.isfile(path) or not path.endswith(".qmd"):
print(f"Error: '{path}' is not a valid .qmd file.")
sys.exit(1)
process_file(path)
else:
print_usage()