mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-08 09:57:21 -05:00

Files

Vijay Janapa Reddi 9a0d14681f chore(audit/notation): extend accept-list with 20 editorially-verified false positives

The notation-consistency scanner flags bare P as 'possible peak performance' and
bare L as 'possible latency'. In practice these occurrences across vol2 are
legitimate uses of P/L for distinct concepts (probability functions like P(X),
parameter count P, prompt-token count P, layer-count L, batch-size ratio
L/L_max). Adding each to the accept-list (matched on file + exact source line)
so the check stays useful for catching genuine convention drift.

Files affected: vol2/{compute_infrastructure, distributed_training, inference,
ops_scale, performance_engineering, responsible_ai, robust_ai}.qmd.

2026-05-06 08:11:08 -04:00

checks

fix(audit/sources): bound source-note segment by attribute-quote vs image-close context

2026-05-06 08:10:14 -04:00

index

docs(index): Phase H.4.B — accept-list 16 callout-fence inline tags + 1 shorten

2026-05-03 11:22:45 -04:00

release

release-audit: phase B1 deterministic locator + phase D reports

2026-04-18 10:53:05 -04:00

subagent_prompts

…

__init__.py

…

accept_list.py

refactor: native binder integration for notation + 7 other validators

2026-04-25 13:31:10 -04:00

accepted_fps_notation.json

chore(audit/notation): extend accept-list with 20 editorially-verified false positives

2026-05-06 08:11:08 -04:00

accepted_fps.json

style(vol1+vol2): Round 3 convergence via official audit framework

2026-04-24 13:52:15 -04:00

epubcheck-baseline.json

feat(binder): epubcheck ratchet — fail on regression, not on absolute count

2026-04-20 16:26:03 -04:00

fix_script_lane.py

…

ledger.py

…

loop.py

…

protected_contexts.py

Merge feat/mitpress-vol1-copyedit-r1: passes 16-19 + figure-audit pipeline

2026-04-18 08:01:34 -04:00

README.md

pass 16 vol1/vol2: abbreviation-first-use check (item D)

2026-04-08 09:16:31 -04:00

scan.py

refactor: native binder integration for notation + 7 other validators

2026-04-25 13:31:10 -04:00

subagent_lane.py

pass 15 vol2: h3-titlecase batch 3 (64 fixes, 4 files) + drift fix

2026-04-07 15:54:03 -04:00

verify.py

feat: add rename-safe betterbib sync

2026-05-04 18:56:19 -04:00

README.md

book/tools/audit — Pass 15 Audit-Fix-Verify Loop

Automated editorial audit pipeline that scans textbook content against the MIT Press round 1 style rules, applies safe fixes under strict safety gates, and verifies the result.

Status: Phase A complete (infrastructure + 7 check categories). Plan: /Users/VJ/Desktop/MIT_Press_Feedback/15_audit_loop/PLAN.md Rules: /Users/VJ/GitHub/AIConfigs/projects/MLSysBook/.claude/rules/book-prose-merged.md

The five-stage cycle

┌────────┐   ┌────────┐   ┌────────┐   ┌────────┐   ┌─────────┐
│ 1 SCAN │ → │ 2 PLAN │ → │ 3 FIX  │ → │ 4 VERIFY│ → │ 5 REPORT│
└────────┘   └────────┘   └────────┘   └────────┘   └─────────┘
 read-only    classify     script       3 checks      commit
              by lane      lane         must pass

Verification is the load-bearing stage. If any check fails, the cycle rolls back and does NOT retry. See verify.py for the three verification stages.

File layout

book/tools/audit/
├── README.md                      — this file
├── __init__.py
├── protected_contexts.py          — LineWalker + inline span detection
├── ledger.py                      — Issue + Ledger JSON model
├── scan.py                        — SCAN stage + CLI
├── accept_list.py                 — persistent FP accept-list (Pass 16 Item A)
├── accepted_fps.json              — seeded from Pass 15 FINAL ledgers (75 entries)
├── fix_script_lane.py             — FIX stage (script lane) + 5 safety checks
├── verify.py                      — VERIFY stage (3 checks)
├── loop.py                        — orchestrator CLI
├── checks/
│   ├── __init__.py
│   ├── vs_period.py               — bare 'vs' → 'vs.'
│   ├── compound_prefix.py         — pre-/non- close-up (strict 6-term list)
│   ├── percent_symbol.py          — '%' → 'percent' in body prose
│   ├── lowercase_prose_references.py  — 'Chapter 12' → 'chapter 12'
│   ├── acknowledgements_spelling.py   — British → American
│   ├── binary_units.py            — GiB/TiB in prose (detection only)
│   └── h3_titlecase.py            — H3+ headings in title case (detection only)
└── subagent_prompts/              — (Phase B) prompts for judgment-required checks

Persistent accept-list (Pass 16 Item A)

accept_list.py + accepted_fps.json together encode Pass 15's editorial verdict on 75 h3-titlecase scanner false positives (proper-noun-heavy headings, named principles, legislation, after-colon CMS 8.158 caps, D·A·M/C³ taxonomy axes). After every scan, matching issues are flipped from open to accepted and tagged with the §10.9 sub-rule that justifies them. Match key is (category, repo-relative file, exact before line) — if a heading is intentionally edited, its accept-list entry stops matching and the issue correctly returns to open for re-review.

# Default: accept-list applied, summary shows matched + stale counts
python3 book/tools/audit/scan.py --scope vol1 -v

# Reproduce pre-Pass-16 behavior (all 75 FPs report as open)
python3 book/tools/audit/scan.py --scope vol1 --no-accept-list -v

# Use a different accept-list file (e.g. a draft to iterate on)
python3 book/tools/audit/scan.py --scope vol1 --accept-list /tmp/draft.json

CLI usage

All commands are from the repo root.

Scan only (dry run)

python3 book/tools/audit/scan.py --scope vol2 --verbose
python3 book/tools/audit/scan.py --scope vol1 --output vol1-ledger.json --verbose

Produces audit-ledger.json (or the path given by --output).

Fix one category (dry run)

python3 book/tools/audit/fix_script_lane.py \
    --ledger audit-ledger.json \
    --categories vs-period \
    --dry-run --verbose

Run the full loop

# Dry run (scan + plan + report, no file changes)
python3 book/tools/audit/loop.py --scope vol2 --dry-run --verbose

# Apply, verify, but don't commit
python3 book/tools/audit/loop.py --scope vol2 \
    --categories vs-period,compound-prefix-closeup \
    --apply --verbose

# Apply, verify, and commit each iteration
python3 book/tools/audit/loop.py --scope vol2 \
    --categories vs-period,compound-prefix-closeup \
    --apply --commit-each-iteration --verbose

# Add quarto check (expensive) to verify stage
python3 book/tools/audit/loop.py --scope vol2 \
    --categories vs-period --apply --quarto-check --verbose

Check categories

Category	Rule	Lane	Notes
`vs-period`	book-prose-merged §10.10	script	Proven from pass 10b
`compound-prefix-closeup`	§10.8	script	Strict 6-term list, no extrapolation
`percent-symbol`	§10.2	script	HTML attribute filter (width=N%)
`lowercase-prose-references`	§10.4	script	Hand-written "Chapter 12"
`acknowledgements-spelling`	§10.7	script	British → American
`binary-units-in-prose`	§1	accept	Detection only; needs human
`h3-titlecase`	§10.9	subagent	Per-heading judgment required

Phase A covers all 7 categories above as detection. Phase B adds parallel subagent dispatch for h3-titlecase.

Validation anchors (Phase A baseline)

Scan times on a cold run from the repo root:

$ python3 book/tools/audit/scan.py --scope vol1 -v
Total: 629 issues across 34 files (0.4s)

$ python3 book/tools/audit/scan.py --scope vol2 -v
Total: 969 issues across 39 files (0.4s)

Per-category counts (baseline for regression detection):

Category	vol1	vol2
vs-period	0	16
compound-prefix-closeup	19	46
percent-symbol	1	160
lowercase-prose-references	0	0
acknowledgements-spelling	0	0
binary-units-in-prose	0	0
h3-titlecase	609	747

The h3-titlecase: 609 matches the Pass 15 plan's expected ~611 (off by 2, within tolerance). The vs-period: 0 on vol1 confirms pass 10b's work is intact.

Post-Pass-16 anchor (2026-04-08, Items A+B+C+D)

After Pass 15's 847 editorial fixes, Pass 16 Item A's persistent accept-list, Pass 16 Item B's h3_titlecase detector improvements, the LineWalker $$ {#eq-label} display-math fix, Pass 16 Item C's new concept-term-capitalization check, and Pass 16 Item D's new abbreviation-first-use check, the scanner reports the following steady state:

Category	vol1 open	vol2 open
vs-period	0	0
compound-prefix-closeup	0	0
percent-symbol	0	0
lowercase-prose-references	0	0
acknowledgements-spelling	0	0
binary-units-in-prose	0	0
h3-titlecase	61	29
concept-term-capitalization	19	62
abbreviation-first-use	163	111
TOTAL	243	202

All 445 open entries are editorial work newly surfaced by the improved detector and new check categories. Breakdown:

vol1 h3-titlecase (61): 42 from Item B's compound-second-part rule and concept-term override + 19 made visible by the LineWalker fix.
vol2 h3-titlecase (29): All 29 from the LineWalker fix, concentrated in performance_engineering.qmd, which had a $$ {#eq-iron-law-perf} at line 87 that suppressed scanning of lines 88-2089.
vol1 concept-term (19): §10.3 lowercase-concept-term violations in body prose (Iron Law, Data Gravity, Information Roofline, Scaling Laws, ...).
vol2 concept-term (62): Same check, much larger since vol2 was not swept for §10.3 in round 1 pass 4.
vol1 abbreviation-first-use (163): §10.5 violations where a bare abbreviation appears before its canonical introduction in the same chapter (TPU, LLM, CNN, GEMM, ReLU, ...).
vol2 abbreviation-first-use (111): Same check on vol2. Top offenders are TPU (16), LLM (11), Adam (9).

All entries are editorial work for a separate content-edit pass. Item B/C/D scope is scanner engineering, not content fixes.

The accept-list (accepted_fps.json) remains at 0 entries. No new FPs have been identified under the improved detector + new categories. Item D uses file-level exclusions for glossary.qmd (glossaries are definitions, not first uses) and excludes the SIFT homonym (CV meaning vs fault-tolerance meaning).

Reproduce with python3 book/tools/audit/scan.py --scope vol1 -v. Run the detector self-tests with:

PYTHONPATH=book/tools python3 book/tools/audit/checks/h3_titlecase.py
PYTHONPATH=book/tools python3 book/tools/audit/checks/concept_term_capitalization.py
PYTHONPATH=book/tools python3 book/tools/audit/checks/abbreviation_first_use.py

Expect 41/41 passed, 32/32 passed, and 17/17 passed.

Safety invariants

The script lane runs five checks before writing any file. A failure on any one causes immediate rollback:

No null bytes — leftover nulls from broken sentinel pipelines
No leftover sentinels — ⟦SENT0⟧-style markers from stash/restore
Byte delta matches expectation — caught the discarded bulk run
Quarto structural delta is zero — fence/div/YAML counts unchanged
No new issues introduced — re-runs ALL check modules on the new text

Safety check #3 (byte delta) is the most important. If you close up N occurrences of pre-training (-1 char each) and fix M bare vs (+1 char each), the file delta must be exactly -N + M. Anything else means the script touched content it shouldn't have — this is the exact failure mode that the discarded bulk-edit run had.

Stopping conditions (hard-coded)

Per Pass 15 plan section 2.5:

Zero issues remaining in active categories → exit 0 (success)
No progress in an iteration → exit 2 (stuck)
Verification failure → exit 3 (do not retry)
Time budget exceeded (default 30 min wall) → exit 4 (budget)
Max iterations reached → exit 4 (budget)
Commit failure → exit 5

Adversarial test coverage

protected_contexts.py: 14 adversarial tests covering every failure mode from the discarded bulk-edit run (bold definition, callout title, table header, sentence start, index entry, @-ref, citation, footnote ref, fig-cap attr, inline code, inline math).
compound_prefix.py: 21 tests covering the strict 6-term list, domain-compound preservation, acronym/proper-noun continuation, and case preservation.
vs_period.py: validated against pass 10b's claim (vol1 clean, vol2 still has 16 real hits).

Run the adversarial tests inline with each check module during development. A pytest-based test harness is a Phase C deliverable.

Do not

Do not run --apply against vol1 without explicit human approval per category. Vol1 is the MIT Press deliverable.
Do not skip the verify stage (--dry-run is the only exception).
Do not retry a failed verification. Inspect with git diff and either commit or roll back manually.
Do not add a new category without a baseline count — every check must be validated against a known-good state before being trusted.
Do not commit to main or dev. Always commit to the feature branch (feat/mitpress-vol1-copyedit-r1 at the time of writing).

See Pass 15 plan section 10 for the complete "do not" list.