Adds self-referential section checker

Implements a script to detect self-referential or circular section
references within Quarto files. This helps identify potential writing
issues where a section refers to itself, its parent, or its child.
This commit is contained in:
Vijay Janapa Reddi
2025-10-17 10:22:07 -04:00
parent aef3fdfb34
commit 10efe50b47
2 changed files with 328 additions and 3 deletions

View File

@@ -309,7 +309,7 @@ Modern machine learning frameworks operate through the integration of four key l
**Framework Layer Interaction**: Modern machine learning frameworks organize functionality into distinct layers (fundamentals, data handling, developer interface, and execution & abstraction) that collaborate to streamline model building and deployment. This layered architecture enables modularity and allows developers to focus on specific aspects of the machine learning workflow without needing to manage low-level infrastructure.
:::
The Fundamentals layer establishes the structural basis of these frameworks through computational graphs. These graphs use the directed acyclic graph (DAG) representation detailed in @sec-ai-frameworks-computational-graphs-f0ff, enabling automatic differentiation and optimization. By organizing operations and data dependencies, computational graphs provide the framework with the ability to distribute workloads and execute computations across a variety of hardware platforms.
The Fundamentals layer establishes the structural basis of these frameworks through computational graphs. These graphs use the directed acyclic graph (DAG) representation, enabling automatic differentiation and optimization. By organizing operations and data dependencies, computational graphs provide the framework with the ability to distribute workloads and execute computations across a variety of hardware platforms.
Building upon this structural foundation, the Data Handling layer manages numerical data and parameters essential for machine learning workflows. Central to this layer are specialized data structures, such as tensors, which handle high-dimensional arrays while optimizing memory usage and device placement. Memory management and data movement strategies ensure that computational workloads are executed effectively, particularly in environments with diverse or limited hardware resources.
@@ -323,11 +323,11 @@ Our exploration begins with computational graphs because they form the structura
### Computational Graphs {#sec-ai-frameworks-computational-graphs-f0ff}
The computational graph serves as the central abstraction enabling frameworks to transform intuitive model descriptions into efficient hardware execution. This representation organizes mathematical operations and their dependencies to enable automatic optimization, parallelization, and hardware specialization.
The computational graph is the central abstraction that enables frameworks to transform intuitive model descriptions into efficient hardware execution. This representation organizes mathematical operations and their dependencies to enable automatic optimization, parallelization, and hardware specialization.
#### Computational Graph Fundamentals {#sec-ai-frameworks-computational-graph-fundamentals-4979}
Computational graphs emerged as a key abstraction in machine learning frameworks to address the growing complexity of deep learning models. As models grew larger and more complex, efficient execution across diverse hardware platforms became necessary. The computational graph transforms high-level model descriptions into efficient low-level hardware execution [@baydin2018], representing a machine learning model as a directed acyclic graph[^fn-dag-ml] (DAG) where nodes represent operations and edges represent data flow. This DAG abstraction enables automatic differentiation and efficient optimization across diverse hardware platforms, as detailed in @sec-ai-frameworks-computational-graphs-f0ff.
Computational graphs emerged as a key abstraction in machine learning frameworks to address the growing complexity of deep learning models. As models grew larger and more complex, efficient execution across diverse hardware platforms became necessary. The computational graph transforms high-level model descriptions into efficient low-level hardware execution [@baydin2018], representing a machine learning model as a directed acyclic graph[^fn-dag-ml] (DAG) where nodes represent operations and edges represent data flow. This DAG abstraction enables automatic differentiation and efficient optimization across diverse hardware platforms.
[^fn-dag-ml]: **Directed Acyclic Graph (DAG)**: In machine learning frameworks, DAGs represent computation where nodes are operations (like matrix multiplication or activation functions) and edges are data dependencies. Unlike general DAGs in computer science, ML computational graphs specifically optimize for automatic differentiation, enabling frameworks to compute gradients by traversing the graph in reverse order.

View File

@@ -0,0 +1,325 @@
#!/usr/bin/env python3
"""
Detect self-referential or circular section references in Quarto files.
This script identifies cases where:
1. A section refers to itself
2. A section refers to its immediate parent
3. A section refers to its immediate child
These patterns usually indicate writing issues that should be reviewed.
"""
import re
import sys
from pathlib import Path
from typing import Dict, List, Tuple, Optional
from collections import defaultdict
def parse_heading_structure(content: str, filepath: Path) -> List[Dict]:
"""
Parse all headings from a Quarto file with their IDs, levels, and positions.
Returns:
List of dicts with keys: level, title, id, line_num, parent_id
"""
headings = []
lines = content.split('\n')
# Stack to track parent sections at each level
parent_stack = {} # level -> heading dict
for line_num, line in enumerate(lines, start=1):
# Match Markdown headings with optional IDs
# Format: ### Heading {#sec-id-here}
heading_match = re.match(r'^(#{1,6})\s+(.+?)(?:\s+\{#([^\}]+)\})?$', line)
if heading_match:
level = len(heading_match.group(1))
title = heading_match.group(2).strip()
section_id = heading_match.group(3)
# Find parent (closest heading with lower level)
parent_id = None
for parent_level in range(level - 1, 0, -1):
if parent_level in parent_stack:
parent_id = parent_stack[parent_level]['id']
break
heading_dict = {
'level': level,
'title': title,
'id': section_id,
'line_num': line_num,
'parent_id': parent_id,
'filepath': filepath
}
headings.append(heading_dict)
# Update parent stack
parent_stack[level] = heading_dict
# Clear deeper levels
parent_stack = {k: v for k, v in parent_stack.items() if k <= level}
return headings
def extract_cross_references(content: str, filepath: Path) -> List[Dict]:
"""
Extract all section cross-references from content.
Returns:
List of dicts with keys: ref_id, line_num, context
"""
references = []
lines = content.split('\n')
for line_num, line in enumerate(lines, start=1):
# Find all @sec- references
matches = re.finditer(r'@(sec-[a-zA-Z0-9\-]+)', line)
for match in matches:
ref_id = match.group(1)
references.append({
'ref_id': ref_id,
'line_num': line_num,
'context': line.strip(),
'filepath': filepath
})
return references
def build_section_hierarchy(headings: List[Dict]) -> Tuple[Dict, Dict]:
"""
Build mappings for section relationships.
Returns:
(section_map, children_map)
- section_map: id -> heading dict
- children_map: id -> list of child section ids
"""
section_map = {}
children_map = defaultdict(list)
for heading in headings:
if heading['id']:
section_map[heading['id']] = heading
# Track parent-child relationships
if heading['parent_id']:
children_map[heading['parent_id']].append(heading['id'])
return section_map, children_map
def find_section_for_reference(ref_line_num: int, headings: List[Dict]) -> Optional[Dict]:
"""
Find which section a reference belongs to based on line number.
"""
current_section = None
for heading in headings:
if heading['line_num'] <= ref_line_num:
current_section = heading
else:
break
return current_section
def check_self_referential_issues(filepath: Path) -> List[Dict]:
"""
Check a single file for self-referential section issues.
Returns:
List of issue dicts with keys: type, section, reference, line_num, message
"""
issues = []
try:
content = filepath.read_text(encoding='utf-8')
except Exception as e:
print(f"Warning: Could not read {filepath}: {e}", file=sys.stderr)
return issues
headings = parse_heading_structure(content, filepath)
references = extract_cross_references(content, filepath)
section_map, children_map = build_section_hierarchy(headings)
for ref in references:
ref_id = ref['ref_id']
ref_line_num = ref['line_num']
# Find which section this reference is in
current_section = find_section_for_reference(ref_line_num, headings)
if not current_section or not current_section['id']:
continue
current_id = current_section['id']
# Check for self-reference (exact match)
if ref_id == current_id:
issues.append({
'type': 'self_reference',
'section': current_section['title'],
'section_id': current_id,
'reference_id': ref_id,
'line_num': ref_line_num,
'filepath': filepath,
'message': f"Section refers to itself: '{current_section['title']}' references @{ref_id}"
})
# Check for parent reference
elif current_section['parent_id'] == ref_id:
parent_section = section_map.get(ref_id)
parent_title = parent_section['title'] if parent_section else 'Unknown'
issues.append({
'type': 'parent_reference',
'section': current_section['title'],
'section_id': current_id,
'reference_id': ref_id,
'line_num': ref_line_num,
'filepath': filepath,
'message': f"Section refers to its parent: '{current_section['title']}' references parent '{parent_title}' (@{ref_id})"
})
# Check for immediate child reference
elif ref_id in children_map.get(current_id, []):
child_section = section_map.get(ref_id)
child_title = child_section['title'] if child_section else 'Unknown'
issues.append({
'type': 'child_reference',
'section': current_section['title'],
'section_id': current_id,
'reference_id': ref_id,
'line_num': ref_line_num,
'filepath': filepath,
'message': f"Section refers to its immediate child: '{current_section['title']}' references child '{child_title}' (@{ref_id})"
})
return issues
def scan_directory(directory: Path, pattern: str = "**/*.qmd") -> List[Dict]:
"""
Scan all Quarto files in a directory for self-referential issues.
"""
all_issues = []
for filepath in directory.glob(pattern):
if filepath.is_file():
issues = check_self_referential_issues(filepath)
all_issues.extend(issues)
return all_issues
def print_report(issues: List[Dict], verbose: bool = False):
"""
Print a formatted report of issues found.
"""
if not issues:
print("✅ No self-referential section issues found.")
return
# Group by type
by_type = defaultdict(list)
for issue in issues:
by_type[issue['type']].append(issue)
print(f"\n🔍 Found {len(issues)} self-referential section issue(s):\n")
for issue_type in ['self_reference', 'parent_reference', 'child_reference']:
if issue_type not in by_type:
continue
type_name = issue_type.replace('_', ' ').title()
type_issues = by_type[issue_type]
print(f"\n{type_name} ({len(type_issues)} issue(s)):")
print("=" * 80)
for issue in type_issues:
rel_path = issue['filepath'].relative_to(Path.cwd()) if issue['filepath'].is_relative_to(Path.cwd()) else issue['filepath']
print(f"\n File: {rel_path}")
print(f" Line: {issue['line_num']}")
print(f" {issue['message']}")
if verbose:
print(f" Section ID: {issue['section_id']}")
print(f" Reference ID: {issue['reference_id']}")
print("\n" + "=" * 80)
print(f"Total: {len(issues)} issue(s) found\n")
def main():
"""
Main entry point for the script.
"""
import argparse
parser = argparse.ArgumentParser(
description="Detect self-referential section references in Quarto files",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Check a specific file
python check_self_referential_sections.py quarto/contents/core/frameworks/frameworks.qmd
# Check all files in a directory
python check_self_referential_sections.py quarto/contents/
# Check with verbose output
python check_self_referential_sections.py quarto/contents/ --verbose
"""
)
parser.add_argument(
'path',
type=Path,
nargs='?',
default=Path('quarto/contents'),
help='Path to file or directory to check (default: quarto/contents)'
)
parser.add_argument(
'-v', '--verbose',
action='store_true',
help='Show detailed output including section and reference IDs'
)
parser.add_argument(
'--pattern',
default='**/*.qmd',
help='Glob pattern for files to check (default: **/*.qmd)'
)
args = parser.parse_args()
path = args.path.resolve()
if not path.exists():
print(f"Error: Path does not exist: {path}", file=sys.stderr)
sys.exit(1)
# Scan files
if path.is_file():
issues = check_self_referential_issues(path)
else:
issues = scan_directory(path, args.pattern)
# Print report
print_report(issues, verbose=args.verbose)
# Exit with error code if issues found
sys.exit(1 if issues else 0)
if __name__ == '__main__':
main()