Add validation tool: NBGrader config validator

- Add comprehensive NBGrader configuration validator
- Validates Jupytext headers, solution blocks, cell metadata
- Checks for duplicate grade IDs and proper schema version
- Provides detailed validation reports with severity levels
This commit is contained in:
Vijay Janapa Reddi
2025-11-11 19:04:58 -05:00
parent 9a0924376e
commit 91ac8458cd
2 changed files with 535 additions and 47 deletions

View File

@@ -1,67 +1,69 @@
# TinyTorch Datasets
This directory contains datasets for TinyTorch examples and training.
This directory contains datasets for TinyTorch milestone examples.
## Directory Structure
```
datasets/
├── tiny/ ← Tiny datasets shipped with repo (~100KB each)
│ └── digits_8x8.npz (1,797 samples, 67KB)
── mnist/ ← Full MNIST (downloaded, gitignored)
├── cifar10/ ← Full CIFAR-10 (downloaded, gitignored)
└── download_*.py ← Download scripts for large datasets
├── tinydigits/ ← 8×8 handwritten digits (ships with repo, ~310KB)
├── tinytalks/ ← Q&A dataset for transformers (ships with repo, ~40KB)
── README.md ← This file
```
## Quick Start
## Shipped Datasets (No Download Required)
**For learning (instant, offline):**
```python
# Use tiny shipped datasets
import numpy as np
data = np.load('datasets/tiny/digits_8x8.npz')
```
### TinyDigits
- **Used by:** Milestones 03 & 04 (MLP and CNN examples)
- **Contents:** 1,000 training + 200 test samples
- **Format:** 8×8 grayscale images, pickled
- **Size:** ~310 KB
- **Purpose:** Fast iteration on real image classification
**For serious training (download once):**
```bash
python datasets/download_mnist.py
```
### TinyTalks
- **Used by:** Milestone 05 (Transformer/GPT examples)
- **Contents:** 350 Q&A pairs across 5 difficulty levels
- **Format:** Plain text (Q: ... A: ... format)
- **Size:** ~40 KB
- **Purpose:** Character-level conversational AI training
## MNIST Dataset
## Downloaded Datasets (On-Demand)
The `mnist/` directory should contain the MNIST or Fashion-MNIST dataset files:
- `train-images-idx3-ubyte.gz` - Training images (60,000 samples)
- `train-labels-idx1-ubyte.gz` - Training labels
- `t10k-images-idx3-ubyte.gz` - Test images (10,000 samples)
- `t10k-labels-idx1-ubyte.gz` - Test labels
The milestones automatically download larger datasets when needed:
### Downloading the Dataset
### MNIST
- **Used by:** `milestones/03_1986_mlp/02_rumelhart_mnist.py`
- **Downloads to:** `milestones/datasets/mnist/`
- **Contents:** 60K training + 10K test samples
- **Format:** 28×28 grayscale images
- **Size:** ~10 MB compressed
- **Auto-downloaded by:** `milestones/data_manager.py`
Run the provided download script:
```bash
cd datasets
python download_mnist.py
```
### CIFAR-10
- **Used by:** `milestones/04_1998_cnn/02_lecun_cifar10.py`
- **Downloads to:** `milestones/datasets/cifar-10/`
- **Contents:** 50K training + 10K test samples
- **Format:** 32×32 RGB images
- **Size:** ~170 MB compressed
- **Auto-downloaded by:** `milestones/data_manager.py`
This will download Fashion-MNIST (which has the same format as MNIST but is more accessible).
## Design Philosophy
### Dataset Format
**Shipped datasets** follow Karpathy's "~1K samples" philosophy:
- Small enough to ship with repo
- Large enough for meaningful learning
- Fast training (seconds to minutes)
- Instant gratification for students
Both MNIST and Fashion-MNIST use the same IDX file format:
- Images: 28x28 grayscale pixels
- Labels: Integer values 0-9
- Gzipped for compression
**Downloaded datasets** are full benchmarks:
- Standard ML benchmarks (MNIST, CIFAR-10)
- Larger, slower, more realistic
- Auto-downloaded only when needed
- Used for scaling demonstrations
Fashion-MNIST classes:
- 0: T-shirt/top
- 1: Trouser
- 2: Pullover
- 3: Dress
- 4: Coat
- 5: Sandal
- 6: Shirt
- 7: Sneaker
- 8: Bag
- 9: Ankle boot
## Total Repository Size
The examples will work with either original MNIST digits or Fashion-MNIST items.
- **Shipped data:** ~350 KB (tinydigits + tinytalks)
- **USB-friendly:** Entire repo fits on any device
- **Offline-capable:** Core milestones work without internet
- **Git-friendly:** No large binary files in version control

View File

@@ -0,0 +1,486 @@
#!/usr/bin/env python3
"""
NBGrader Configuration Validation Script
Validates all TinyTorch modules for NBGrader compatibility
"""
import re
import json
from pathlib import Path
from collections import defaultdict
from typing import Dict, List, Tuple, Set
class NBGraderValidator:
"""Validates NBGrader configuration in Jupytext Python files"""
def __init__(self, module_path: Path):
self.module_path = module_path
self.module_name = module_path.stem
self.content = module_path.read_text()
self.lines = self.content.split('\n')
self.issues = []
self.grade_ids = []
self.cells = self._parse_cells()
def _parse_cells(self) -> List[Dict]:
"""Parse Jupytext file into cells"""
cells = []
current_cell = None
in_metadata = False
metadata_lines = []
for i, line in enumerate(self.lines, 1):
# Detect cell boundaries
if line.startswith('# %%'):
# Save previous cell
if current_cell:
cells.append(current_cell)
# Start new cell
is_markdown = '[markdown]' in line
current_cell = {
'line_start': i,
'type': 'markdown' if is_markdown else 'code',
'content': [],
'metadata': {},
'raw_line': line
}
# Check for inline metadata
if 'nbgrader=' in line:
try:
# Extract JSON from cell marker
match = re.search(r'nbgrader=({[^}]+})', line)
if match:
metadata_str = match.group(1)
# Clean up the string for JSON parsing
metadata_str = metadata_str.replace("'", '"')
current_cell['metadata'] = {'nbgrader': json.loads(metadata_str)}
except:
pass
elif current_cell:
# Check for metadata block at start of cell
if line.strip().startswith('# metadata='):
in_metadata = True
metadata_lines = [line]
elif in_metadata:
metadata_lines.append(line)
if line.strip() == '# ---':
in_metadata = False
# Parse metadata
try:
metadata_text = '\n'.join(metadata_lines)
# Extract the dictionary part
match = re.search(r'metadata=({.*?})\s*# ---', metadata_text, re.DOTALL)
if match:
metadata_str = match.group(1).replace("'", '"')
current_cell['metadata'] = json.loads(metadata_str)
except:
pass
metadata_lines = []
else:
current_cell['content'].append(line)
# Don't forget last cell
if current_cell:
cells.append(current_cell)
return cells
def validate_jupytext_header(self) -> bool:
"""Check for proper Jupytext header in first 15 lines"""
header_found = False
jupytext_marker = False
for i, line in enumerate(self.lines[:15]):
if line.startswith('# ---'):
header_found = True
if 'jupytext:' in line or 'text_representation:' in line:
jupytext_marker = True
if not header_found:
self.issues.append({
'severity': 'P0-BLOCKER',
'category': 'Jupytext Header',
'line': 1,
'issue': 'Missing Jupytext YAML header (lines 1-13)',
'detail': 'File must start with # --- header containing jupytext metadata'
})
return False
if not jupytext_marker:
self.issues.append({
'severity': 'P0-BLOCKER',
'category': 'Jupytext Header',
'line': 1,
'issue': 'Jupytext header missing required fields',
'detail': 'Header must contain jupytext: and text_representation: fields'
})
return False
return True
def validate_solution_blocks(self) -> bool:
"""Check for proper BEGIN/END SOLUTION pairing"""
begin_count = 0
end_count = 0
stack = []
for i, line in enumerate(self.lines, 1):
if '### BEGIN SOLUTION' in line:
begin_count += 1
stack.append(i)
elif '### END SOLUTION' in line:
end_count += 1
if not stack:
self.issues.append({
'severity': 'P0-BLOCKER',
'category': 'Solution Blocks',
'line': i,
'issue': 'END SOLUTION without matching BEGIN',
'detail': f'Found ### END SOLUTION at line {i} without prior ### BEGIN SOLUTION'
})
else:
stack.pop()
# Check for unmatched BEGINs
if stack:
for line_num in stack:
self.issues.append({
'severity': 'P0-BLOCKER',
'category': 'Solution Blocks',
'line': line_num,
'issue': 'BEGIN SOLUTION without matching END',
'detail': f'Found ### BEGIN SOLUTION at line {line_num} without matching ### END SOLUTION'
})
if begin_count != end_count:
self.issues.append({
'severity': 'P0-BLOCKER',
'category': 'Solution Blocks',
'line': 0,
'issue': f'Mismatched solution blocks: {begin_count} BEGIN vs {end_count} END',
'detail': 'Every BEGIN SOLUTION must have exactly one END SOLUTION'
})
return False
return len(stack) == 0
def validate_cell_metadata(self) -> bool:
"""Check cell metadata for NBGrader requirements"""
all_valid = True
grade_ids_seen = set()
for cell in self.cells:
if 'nbgrader' not in cell['metadata']:
# Check if this is a cell that should have metadata
content_str = '\n'.join(cell['content'])
# Solution cells should have metadata
if '### BEGIN SOLUTION' in content_str:
self.issues.append({
'severity': 'P0-BLOCKER',
'category': 'Cell Metadata',
'line': cell['line_start'],
'issue': 'Solution cell missing NBGrader metadata',
'detail': 'Cell contains BEGIN SOLUTION but no nbgrader metadata'
})
all_valid = False
# Test cells should have metadata
if re.search(r'def test_unit_', content_str):
self.issues.append({
'severity': 'P0-BLOCKER',
'category': 'Cell Metadata',
'line': cell['line_start'],
'issue': 'Test cell missing NBGrader metadata',
'detail': 'Cell contains test function but no nbgrader metadata'
})
all_valid = False
continue
nbgrader = cell['metadata']['nbgrader']
# Check for required fields
if 'grade_id' in nbgrader:
grade_id = nbgrader['grade_id']
self.grade_ids.append(grade_id)
# Check for duplicates
if grade_id in grade_ids_seen:
self.issues.append({
'severity': 'P0-BLOCKER',
'category': 'Grade IDs',
'line': cell['line_start'],
'issue': f'Duplicate grade_id: {grade_id}',
'detail': 'Every grade_id must be unique within the module'
})
all_valid = False
else:
grade_ids_seen.add(grade_id)
# Validate test cells
if nbgrader.get('grade') == True:
if not nbgrader.get('locked', False):
self.issues.append({
'severity': 'P1-IMPORTANT',
'category': 'Test Cell',
'line': cell['line_start'],
'issue': 'Test cell not locked',
'detail': f'grade_id={nbgrader.get("grade_id")}: Test cells must have locked=true'
})
all_valid = False
if 'points' not in nbgrader:
self.issues.append({
'severity': 'P0-BLOCKER',
'category': 'Test Cell',
'line': cell['line_start'],
'issue': 'Test cell missing points',
'detail': f'grade_id={nbgrader.get("grade_id")}: Graded cells must have points assigned'
})
all_valid = False
if nbgrader.get('solution', False):
self.issues.append({
'severity': 'P1-IMPORTANT',
'category': 'Test Cell',
'line': cell['line_start'],
'issue': 'Test cell marked as solution',
'detail': f'grade_id={nbgrader.get("grade_id")}: Test cells should have solution=false'
})
all_valid = False
# Validate solution cells
if nbgrader.get('solution') == True:
if nbgrader.get('grade', False):
self.issues.append({
'severity': 'P2-ADVISORY',
'category': 'Solution Cell',
'line': cell['line_start'],
'issue': 'Solution cell marked for grading',
'detail': f'grade_id={nbgrader.get("grade_id")}: Solution cells typically have grade=false'
})
if nbgrader.get('locked', False):
self.issues.append({
'severity': 'P1-IMPORTANT',
'category': 'Solution Cell',
'line': cell['line_start'],
'issue': 'Solution cell is locked',
'detail': f'grade_id={nbgrader.get("grade_id")}: Solution cells should have locked=false'
})
all_valid = False
return all_valid
def validate_cell_types(self) -> bool:
"""Verify proper cell type markers"""
all_valid = True
for i, line in enumerate(self.lines, 1):
if line.startswith('# %%'):
# Check for invalid cell markers
if line.startswith('# %%%') or line.startswith('#%%') and not line.startswith('# %%'):
self.issues.append({
'severity': 'P1-IMPORTANT',
'category': 'Cell Type',
'line': i,
'issue': 'Invalid cell marker syntax',
'detail': f'Cell marker must be "# %%" or "# %% [markdown]", found: {line[:30]}'
})
all_valid = False
return all_valid
def check_schema_version(self) -> bool:
"""Check for nbgrader schema version"""
all_valid = True
for cell in self.cells:
if 'nbgrader' in cell['metadata']:
schema_version = cell['metadata']['nbgrader'].get('schema_version')
if schema_version != 3:
self.issues.append({
'severity': 'P2-ADVISORY',
'category': 'Schema Version',
'line': cell['line_start'],
'issue': f'NBGrader schema version is {schema_version}, expected 3',
'detail': 'Schema version 3 is current standard'
})
all_valid = False
return all_valid
def run_all_validations(self) -> Dict:
"""Run all validation checks"""
results = {
'module': self.module_name,
'path': str(self.module_path),
'checks': {
'jupytext_header': self.validate_jupytext_header(),
'solution_blocks': self.validate_solution_blocks(),
'cell_metadata': self.validate_cell_metadata(),
'cell_types': self.validate_cell_types(),
'schema_version': self.check_schema_version(),
},
'issues': self.issues,
'grade_ids': self.grade_ids,
'cell_count': len(self.cells),
'status': 'PASS' if not self.issues else 'FAIL'
}
# Count by severity
results['issue_count'] = {
'P0-BLOCKER': len([i for i in self.issues if i['severity'] == 'P0-BLOCKER']),
'P1-IMPORTANT': len([i for i in self.issues if i['severity'] == 'P1-IMPORTANT']),
'P2-ADVISORY': len([i for i in self.issues if i['severity'] == 'P2-ADVISORY']),
}
return results
def validate_all_modules(modules_dir: Path) -> Dict:
"""Validate all modules in the directory"""
results = {}
# Find all module Python files
module_files = sorted(modules_dir.glob('*/[0-9][0-9]_*.py'))
# Also check for named files like tensor.py, activations.py, etc.
for module_dir in sorted(modules_dir.glob('[0-9][0-9]_*')):
module_py_files = list(module_dir.glob('*.py'))
# Filter out test and validation files
module_py_files = [f for f in module_py_files if not any(
exclude in f.name for exclude in ['test_', 'validate_', 'analysis', '__']
)]
if module_py_files:
# Use the first non-test Python file found
module_file = module_py_files[0]
validator = NBGraderValidator(module_file)
result = validator.run_all_validations()
results[module_dir.name] = result
return results
def print_validation_report(results: Dict):
"""Print comprehensive validation report"""
print("=" * 100)
print("NBGrader Configuration Validation Report")
print("=" * 100)
print()
# Summary statistics
total_modules = len(results)
passed_modules = sum(1 for r in results.values() if r['status'] == 'PASS')
failed_modules = total_modules - passed_modules
total_blockers = sum(r['issue_count']['P0-BLOCKER'] for r in results.values())
total_important = sum(r['issue_count']['P1-IMPORTANT'] for r in results.values())
total_advisory = sum(r['issue_count']['P2-ADVISORY'] for r in results.values())
print(f"SUMMARY:")
print(f" Total Modules: {total_modules}")
print(f" Passed: {passed_modules}")
print(f" Failed: {failed_modules}")
print(f" Overall Status: {'PASS' if failed_modules == 0 else 'FAIL'}")
print()
print(f"ISSUE BREAKDOWN:")
print(f" P0-BLOCKER (Critical): {total_blockers}")
print(f" P1-IMPORTANT: {total_important}")
print(f" P2-ADVISORY: {total_advisory}")
print(f" Total Issues: {total_blockers + total_important + total_advisory}")
print()
# Per-module status matrix
print("=" * 100)
print("MODULE VALIDATION MATRIX")
print("=" * 100)
print(f"{'Module':<25} {'Status':<8} {'Cells':<7} {'P0':<5} {'P1':<5} {'P2':<5} {'Grade IDs':<12}")
print("-" * 100)
for module_name, result in sorted(results.items()):
status_icon = "PASS" if result['status'] == 'PASS' else "FAIL"
print(f"{module_name:<25} {status_icon:<8} {result['cell_count']:<7} "
f"{result['issue_count']['P0-BLOCKER']:<5} "
f"{result['issue_count']['P1-IMPORTANT']:<5} "
f"{result['issue_count']['P2-ADVISORY']:<5} "
f"{len(result['grade_ids']):<12}")
print()
# Detailed issues by module
print("=" * 100)
print("DETAILED ISSUES BY MODULE")
print("=" * 100)
for module_name, result in sorted(results.items()):
if result['issues']:
print()
print(f"MODULE: {module_name}")
print(f"Path: {result['path']}")
print(f"Status: {result['status']}")
print("-" * 100)
# Group by severity
for severity in ['P0-BLOCKER', 'P1-IMPORTANT', 'P2-ADVISORY']:
severity_issues = [i for i in result['issues'] if i['severity'] == severity]
if severity_issues:
print(f"\n {severity}:")
for issue in severity_issues:
print(f" Line {issue['line']:4d} | {issue['category']:<20} | {issue['issue']}")
print(f" {issue['detail']}")
# Check summary
print()
print("=" * 100)
print("VALIDATION CHECK SUMMARY")
print("=" * 100)
check_names = ['jupytext_header', 'solution_blocks', 'cell_metadata', 'cell_types', 'schema_version']
for check in check_names:
passed = sum(1 for r in results.values() if r['checks'][check])
failed = total_modules - passed
status = "PASS" if failed == 0 else "FAIL"
print(f" {check.replace('_', ' ').title():<30} {status:<8} ({passed}/{total_modules} modules)")
print()
print("=" * 100)
print("RECOMMENDATIONS")
print("=" * 100)
if total_blockers > 0:
print("\nCRITICAL BLOCKERS (P0) - Must fix before NBGrader deployment:")
print(" These issues will prevent NBGrader from functioning correctly.")
print(" Priority: Fix immediately")
if total_important > 0:
print("\nIMPORTANT ISSUES (P1) - Should fix soon:")
print(" These issues may cause NBGrader to behave unexpectedly.")
print(" Priority: Fix before student deployment")
if total_advisory > 0:
print("\nADVISORY ISSUES (P2) - Consider fixing:")
print(" These issues are minor but should be addressed for consistency.")
print(" Priority: Fix when convenient")
print()
if __name__ == "__main__":
modules_dir = Path("/Users/VJ/GitHub/TinyTorch/modules")
results = validate_all_modules(modules_dir)
print_validation_report(results)
# Save results to JSON
import json
output_file = Path("/Users/VJ/GitHub/TinyTorch/nbgrader_validation_results.json")
with output_file.open('w') as f:
json.dump(results, f, indent=2)
print(f"\nDetailed results saved to: {output_file}")