docs(vault): document v1.1 sidecar + hierarchy + tier model

Phase 4.8 of CHAIN_ROADMAP.md.

ARCHITECTURE.md gains a new §3.6 capturing the three deltas that landed
during the chain workstream — additive to v1, not replacements:
  - hierarchical question layout (`<track>/<area>/<id>.yaml`)
  - sidecar chain architecture (chains.json authoritative; YAML chains:
    field retired)
  - chain tier model (primary/secondary, default-primary on read)

README.md updates:
  - status line: v1.1, points at CHAIN_ROADMAP.md and ARCHITECTURE.md §3.6
  - new "Chain build pipeline" section with the diagnose / build /
    apply / merge invocations
  - layout listing reflects scripts/ and the actual src/ contents
    (was stuck on Phase 0 scaffolding shape)

No code changes. The v1 release-pipeline invariants absorb the v1.1
deltas without modification (chains.json is a Merkle leaf; tier flows
into that leaf transparently).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Vijay Janapa Reddi
2026-04-30 20:26:09 -04:00
parent ed2ddb51dc
commit f086b6f42f
2 changed files with 108 additions and 12 deletions

View File

@@ -2,8 +2,11 @@
Authoring, building, and releasing the StaffML question vault.
> **Status**: Phase 0 scaffolding. Subcommands land in Phase 1+ per
> [`ARCHITECTURE.md`](../vault/ARCHITECTURE.md) §14.
> **Status**: v1.1 — sidecar chain architecture + tier-aware UI in place.
> Chain corpus growth tracked in
> [`docs/CHAIN_ROADMAP.md`](docs/CHAIN_ROADMAP.md); design baseline is
> [`../vault/ARCHITECTURE.md`](../vault/ARCHITECTURE.md) (§3.6 captures
> the v1.1 deltas).
## Install (local editable)
@@ -69,6 +72,41 @@ All commands support `--json` for machine-readable output per
[`docs/JSON_OUTPUT.md`](docs/JSON_OUTPUT.md). Exit codes are stable per
[`docs/EXIT_CODES.md`](docs/EXIT_CODES.md).
## Chain build pipeline (v1.1+)
Chains are pedagogical progressions through Bloom levels (L1→L6+) within
one (track, topic) bucket. `interviews/vault/chains.json` is the
authoritative registry; YAMLs no longer carry a `chains:` field. The
build tooling lives in `scripts/`:
```bash
# 1. Surface (track, topic) buckets that need chains. Writes
# interviews/vault/chain-coverage.json (gitignored — regeneratable).
python3 scripts/diagnose_chain_coverage.py
# 2. Strict pass: Δ ∈ {1, 2}, primary chains. Default mode.
python3 scripts/build_chains_with_gemini.py --all \
--output ../vault/chains.proposed.json
# 3. Lenient pass: Δ ∈ {0, 1, 2, 3}, secondary chains.
# Use --buckets-from to scope the run to uncovered buckets only.
python3 scripts/build_chains_with_gemini.py --mode lenient \
--buckets-from ../vault/chain-coverage.json \
--output ../vault/chains.proposed.lenient.json
# 4. Apply a single proposed file (replaces chains.json after validation).
python3 scripts/apply_proposed_chains.py --proposed ../vault/chains.proposed.json
# 5. Merge primary + secondary into chains.json with cap enforcement
# (each qid in ≤ 2 chains; non-L1/L2 qids capped at 1 membership).
python3 scripts/merge_chain_passes.py
```
Both `apply_proposed_chains.py` and the validator tolerate a missing
`tier` field on chain entries (defaulting to "primary"); chains
produced by `--mode lenient` are tagged `tier: "secondary"`. After any
change, run `vault check --strict` and `vault build --legacy-json`.
## Run tests
```bash
@@ -81,18 +119,27 @@ pytest interviews/vault-cli/tests/
```
vault-cli/
├── pyproject.toml
├── README.md # this file
├── README.md # this file
├── docs/
│ ├── EXIT_CODES.md # stable exit-code taxonomy
│ ├── JSON_OUTPUT.md # per-command --json schemas
── CUTOVER_QA.md # manual cutover QA checklist
├── src/vault_cli/
│ ├── CHAIN_ROADMAP.md # resumable chain-coverage workstream
│ ├── EXIT_CODES.md # stable exit-code taxonomy
── JSON_OUTPUT.md # per-command --json schemas
│ └── CUTOVER_QA.md # manual cutover QA checklist
├── src/vault_cli/ # Typer app + library
│ ├── __init__.py
│ ├── _version.py
│ ├── exit_codes.py
── main.py # Typer app entry
└── tests/
└── test_smoke.py # Phase 0 smoke tests
│ ├── compiler.py / loader.py / yaml_io.py
│ ├── legacy_export.py # corpus.json + chain_tiers emitter
── policy.py # release-policy filter
│ ├── validator.py # fast / structural / slow tiers
└── main.py # Typer app entry
├── scripts/ # ops + Gemini-powered tools
│ ├── diagnose_chain_coverage.py # surface uncovered buckets
│ ├── build_chains_with_gemini.py # --mode {strict,lenient}
│ ├── apply_proposed_chains.py # gate proposed chains.json
│ ├── merge_chain_passes.py # primary + secondary, cap-enforced
│ ├── summarize_proposed_chains.py # quick-read review
│ └── ... # auditing, calibration, D1 emit, etc.
└── tests/ # pytest suite (74 tests today)
```
## Architecture

View File

@@ -348,6 +348,55 @@ release_hash = sha256(b"\n".join(f"{id}:{h}".encode() for id, h in leaves))
---
### 3.6 v1.1 architecture updates (post-Phase-1/2 — chain build)
After the v1.0 design doc above was written, three deltas landed during
the corpus growth workstream tracked in
[`vault-cli/docs/CHAIN_ROADMAP.md`](../vault-cli/docs/CHAIN_ROADMAP.md).
They are additive to the v1 invariants, not replacements:
**1. Hierarchical question layout.** Questions live at
`interviews/vault/questions/<track>/<area>/<id>.yaml` (the v1 design
above sketched `<track>/L<N>/<zone>/`; the actual landed layout is
`<track>/<area>/`). The hierarchy is a build-time concern — `corpus.json`
is path-agnostic, so site/runtime code is unaffected. `vault check
--strict` enforces a path-vs-body invariant: the directory shards
(track/area) must match the YAML body's `track`/`competency_area` fields.
**2. Sidecar chain architecture.** `interviews/vault/chains.json` is the
authoritative chain registry. Per-question YAMLs no longer carry a
`chains:` field — that field was retired in v1.1 to keep chain rebuilds
to a single-file edit instead of touching 2k+ YAMLs. The exporter
(`vault_cli.legacy_export._build_chain_index`) joins YAML + chains.json
to produce per-question `chain_ids` / `chain_positions` in the runtime
`corpus.json`. §3.3's chain-reference rules still hold for the
**runtime artifact** (multi-chain membership, position monotonic, etc.);
they no longer apply to YAML source.
**3. Chain tier model.** Each entry in `chains.json` carries a
`tier: "primary" | "secondary"` field:
- **primary** — the strict Bloom-progression sweep (Δ ∈ {1, 2}).
Rendered by default in practice/explore.
- **secondary** — the lenient second-pass coverage sweep
(Δ ∈ {0, 1, 2, 3}; Δ=0 only for shared-scenario pairs). Reachable
via `?chain=<id>` URL deep-links and the "more paths" UI; the
`ChainBadge` shows an inline "alt path" pill when rendering one.
- The legacy exporter emits `chain_tiers` per question alongside
`chain_positions`. Missing tier defaults to "primary" everywhere
on read (validator + TS runtime + UI), which keeps the v1.0
chains.json shape forward-compatible.
Tooling that produced these: `diagnose_chain_coverage.py`,
`build_chains_with_gemini.py` (with `--mode {strict,lenient}`),
`merge_chain_passes.py`. See the README's "Chain build pipeline"
section for invocation, and CHAIN_ROADMAP.md for the running log.
The v1 release-pipeline invariants (§3.5 hashing, §5 validators)
absorb these without modification — `chains.json` is a Merkle leaf,
and the new `tier` field flows into that leaf transparently.
---
## 4. CLI Specification (v2)
Framework: **Typer** (declarative, type-hint-driven). Output: **Rich** (tables, progress, panels).