6.9 KiB
Section ID Management System
Overview
The section ID management system provides automated tools for managing unique, consistent section IDs in Quarto/Markdown book projects. The system uses a hierarchy-based approach to generate stable, meaningful section IDs that reflect the actual document structure and ensures global uniqueness across the entire book project.
Key Features
Hierarchy-Based ID Generation
- Stable IDs: Section IDs remain consistent even when sections are reordered (as long as the hierarchy doesn't change)
- Meaningful Structure: IDs reflect the actual document organization and parent-child relationships
- Natural Duplicate Handling: Sections with the same name but different parents automatically get different IDs
- No Counter Dependency: No need to worry about section reordering affecting IDs
- Global Uniqueness: File path inclusion ensures unique IDs across the entire book project
ID Format
Hash Generation
The hash is generated from:
{file_path}|{chapter_title}|{section_title}|{parent_hierarchy}
Where:
file_path: The file path (ensures global uniqueness across different files)chapter_title: The chapter titlesection_title: The section titleparent_hierarchy: A pipe-separated list of all parent sections (e.g.,parent1|parent2|parent3)
Global Uniqueness Guarantee
The inclusion of the file path in the hash generation ensures that sections with identical names and hierarchies in different files will have different IDs. This prevents conflicts when:
- Multiple chapters have sections with the same name (e.g., "Introduction" in different files)
- Different files have identical section hierarchies (e.g., "Techniques > Advanced > Optimization")
- The same section name appears in multiple contexts across the book
Example: Same Section Name in Different Files
# File: contents/chapter1.qmd
# Getting Started
## Introduction {#sec-getting-started-introduction-d212}
# File: contents/chapter2.qmd
# Getting Started
## Introduction {#sec-getting-started-introduction-8435}
Hash inputs:
- File 1:
"contents/chapter1.qmd|Getting Started|Introduction|"→ hash:d212 - File 2:
"contents/chapter2.qmd|Getting Started|Introduction|"→ hash:8435
Result: Different 4-character hashes ensure unique IDs across the entire book.
How It Works
1. Hierarchy Tracking
The system maintains a stack of parent sections as it processes the document:
section_hierarchy = [] # Stack of parent sections
# For each header level, update the hierarchy
while len(section_hierarchy) >= header_level - 1:
section_hierarchy.pop()
section_hierarchy.append(title.strip())
# Get parent sections for current section
parent_sections = section_hierarchy[:-1] if len(section_hierarchy) > 1 else []
2. Hash Generation
# Build hierarchy string from parent sections
hierarchy = ""
if parent_sections:
hierarchy_parts = []
for parent in parent_sections:
hierarchy_parts.append(simple_slugify(parent))
hierarchy = "|".join(hierarchy_parts)
# Generate hash with file path for global uniqueness
hash_input = f"{file_path}|{chapter_title}|{title}|{hierarchy}".encode('utf-8')
hash_suffix = hashlib.sha1(hash_input).hexdigest()[:4]
Example
Consider a document with this structure:
# Introduction
## AI Evolution
### Symbolic AI Era
#### Data Considerations
### Expert Systems Era
#### Data Considerations
### Deep Learning Era
#### Data Considerations
The three "Data Considerations" sections will get different IDs:
sec-introduction-data-considerations-d32a(under Symbolic AI Era)sec-introduction-data-considerations-8ae1(under Expert Systems Era)sec-introduction-data-considerations-fdab(under Deep Learning Era)
Benefits Over Counter-Based Approach
| Aspect | Counter-Based | Hierarchy-Based |
|---|---|---|
| Stability | Changes when sections reordered | Stable unless hierarchy changes |
| Meaning | Arbitrary position-based | Reflects document structure |
| Duplicates | Requires manual counter management | Handled naturally by context |
| Maintenance | Fragile to document changes | Robust and self-maintaining |
| Global Uniqueness | May conflict across files | Guaranteed by file path inclusion |
Usage
Basic Commands
# Add missing IDs
python section_id_manager.py -d contents/
# Repair existing IDs to new format
python section_id_manager.py -d contents/ --repair --backup
# Verify all IDs
python section_id_manager.py -d contents/ --verify
# List all IDs
python section_id_manager.py -d contents/ --list
Safety Features
- Backup Creation:
--backupcreates timestamped backups - Dry Run:
--dry-runpreviews changes without modifying files - Interactive Prompts: Confirms changes before applying
- Force Mode:
--forceautomatically accepts all changes
Migration from Counter-Based System
If you have existing counter-based IDs, the system will automatically migrate them:
- Run repair mode:
python section_id_manager.py -d contents/ --repair --backup - The system will update all IDs to the new hierarchy-based format
- Cross-references will be automatically updated
- Old IDs are preserved in the backup files
Best Practices
- Use backups: Always use
--backupwhen making bulk changes - Verify before commits: Use
--verifyto ensure ID integrity - Preview changes: Use
--dry-runto see what will change - Consider automation: Use in pre-commit hooks or CI pipelines
Technical Details
Function Signature
def generate_section_id(title, file_path, chapter_title, section_counter, parent_sections=None):
Parameters
title: The section titlefile_path: The file path (included in hash for global uniqueness)chapter_title: The chapter titlesection_counter: Counter for this section (not used in hash)parent_sections: List of parent section titles (included in hash)
Parent Sections Format
parent_sectionsis a list of strings representing the full hierarchy- Each parent is processed through
simple_slugify()to remove stopwords - Parents are joined with
|separator in the hash input
Hash Algorithm
- Uses SHA-1 for hash generation
- Takes first 4 hex characters for the suffix
- Ensures uniqueness while keeping IDs readable
- Includes file path to guarantee global uniqueness across the book project
Troubleshooting
Common Issues
- Duplicate IDs: Should not occur with hierarchy-based system and file path inclusion
- Changing IDs: IDs may change when document structure changes (this is expected)
- Cross-reference breaks: Use
--repairto update all references
Debugging
- Use
--listto see all current IDs - Use
--verifyto check for missing or malformed IDs - Check backup files if you need to revert changes