mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-06 01:28:35 -05:00
User concern: preventing commercial reuse of the corpus (e.g., a vendor training a paid product on the questions, selling access to them). CC-BY-NC-4.0 permits research citation + non-commercial derivatives while requiring written permission for commercial use. interviews/vault/questions/LICENSE (NEW) CC-BY-NC-4.0 full text with BibTeX template tied to release_hash. Commercial licensing contact noted. interviews/vault/ARCHITECTURE.md §15 #1 Marked DECIDED. Rationale recorded. vault-cli license intentionally left at historical status (not relicensed as part of this change). interviews/vault/REVIEWS.md License state: DECIDED. Removed from Phase-3 blocker list. interviews/CONTRIBUTING.md New 'License' section: NC constraint explicit. External corpus PRs assumed offered under same CC-BY-NC-4.0. Contact for commercial licensing specified.
191 lines
7.1 KiB
Markdown
191 lines
7.1 KiB
Markdown
# Contributing to StaffML
|
|
|
|
Thanks for your interest. This guide covers contributing to the StaffML vault
|
|
(question corpus) and the site (`interviews/staffml/`).
|
|
|
|
For the full architecture of the vault pipeline, see
|
|
[`vault/ARCHITECTURE.md`](vault/ARCHITECTURE.md). For the review ledger behind
|
|
it, see [`vault/REVIEWS.md`](vault/REVIEWS.md).
|
|
|
|
---
|
|
|
|
## Quick start — from clone to first-question-visible
|
|
|
|
```bash
|
|
# 1. Clone and pick the staffml worktree
|
|
git clone https://github.com/harvard-edge/cs249r_book.git
|
|
cd cs249r_book/interviews
|
|
|
|
# 2. Install vault-cli (Python 3.12+)
|
|
pip install -e vault-cli/[dev]
|
|
vault --version
|
|
pytest vault-cli/tests/
|
|
|
|
# 3. Phase-aware explore
|
|
# At Phase 0 only --version/--help exist. At Phase 1 the corpus is split
|
|
# and `build` / `check` work end-to-end. At Phase 2+, `stats` / `verify`
|
|
# / `export-paper` exist. See §Phase-by-phase scope at the bottom of this
|
|
# file for exactly which subcommands are live at your checkout.
|
|
vault --help # always works — shows subcommands live at your phase
|
|
|
|
# 4. Run the local API shim (requires a prior `vault build` so there's a vault.db)
|
|
# Serves the Worker endpoint surface from a local vault.db so you don't
|
|
# need a Cloudflare account to develop the site. Available from Phase 1 onward.
|
|
vault build # produces interviews/vault/vault.db
|
|
vault api --db interviews/vault/vault.db --port 8002 &
|
|
|
|
# 5. Run the site against your local API
|
|
cd staffml/
|
|
cp .env.example .env.local
|
|
# edit .env.local: NEXT_PUBLIC_VAULT_API=http://localhost:8002
|
|
pnpm install
|
|
pnpm dev
|
|
# visit http://localhost:3000
|
|
```
|
|
|
|
The goal is clone → question-visible in under 10 minutes on a fresh machine. If
|
|
it's longer, file an issue titled "CONTRIBUTING.md getting-started friction".
|
|
|
|
---
|
|
|
|
## What can I contribute?
|
|
|
|
| Contribution type | Where | How |
|
|
|---|---|---|
|
|
| New question | `vault/questions/<track>/<level>/<zone>/` | `vault new` |
|
|
| Fix a question | same | `vault edit <id>` |
|
|
| Reclassify a question | same | `vault move <id> --to <track>/<level>/<zone>` |
|
|
| New topic | `vault/taxonomy.yaml` | PR with a §7 entry in EVOLUTION.md |
|
|
| Website UX | `interviews/staffml/src/` | Next.js; see AGENTS.md in staffml if present |
|
|
| Worker API | `interviews/staffml-vault-worker/` (Phase 3+) | Wrangler project |
|
|
| Schema evolution | `vault/schema/` | RFC-style PR per EVOLUTION.md |
|
|
| New `vault-cli` subcommand | `vault-cli/src/vault_cli/commands/` | Land with tests + docs update |
|
|
|
|
---
|
|
|
|
## Workflow
|
|
|
|
### Branching
|
|
|
|
- Start from `dev` for standalone work: `git checkout -b feat/short-description dev`.
|
|
- One logical change per branch. Atomic commits. No `git add -A` on vault changes.
|
|
- No `Co-Authored-By` tags, no "made with <tool>" footers. Commit messages read like regular engineering work.
|
|
|
|
### Before opening a PR
|
|
|
|
```bash
|
|
vault check --strict # invariants
|
|
pytest vault-cli/tests/ # unit + integration + contract
|
|
vault codegen --check # LinkML ↔ Pydantic/DDL/TS drift check (Phase 1+)
|
|
```
|
|
|
|
CI re-runs these. PRs are merge-blocked on red CI.
|
|
|
|
### PR review
|
|
|
|
- Corpus PRs: at least one maintainer review, CI green.
|
|
- Code PRs (vault-cli, worker): one review, CI green.
|
|
- Schema-evolution PRs: two reviews required once external-contributor onboarding opens (Phase 7+).
|
|
- Schema-breaking PRs must include a migration script under `vault-cli/migrations/`.
|
|
|
|
### Provenance honesty
|
|
|
|
Every question's `provenance` field must honestly reflect how it was made:
|
|
|
|
- `human` — written from scratch by a human.
|
|
- `llm-draft` — produced by `vault generate`; not yet human-reviewed.
|
|
- `llm-then-human-edited` — an LLM draft substantially revised by a human (the
|
|
common case). `generation_meta.human_reviewed_at` records when.
|
|
- `imported` — from an external source (e.g., book, published paper). Include
|
|
source in `tags`.
|
|
|
|
Misattributing LLM content as `human` is a correctness bug, not a style nit.
|
|
|
|
### Author attribution
|
|
|
|
`vault new` populates `authors` from your `git config user.email` via
|
|
`vault/contributors.yaml` (mapping from email → handle). Submit a PR updating
|
|
that file to add yourself.
|
|
|
|
For external PRs: commit signatures (GPG/SSH) or GitHub-verified-email match is
|
|
required — CI rejects `authors:` claims that don't match the committer
|
|
identity.
|
|
|
|
---
|
|
|
|
## Style
|
|
|
|
- Keep questions focused on a **single concept**. "Good napkin math" is usually
|
|
a signal; "grab-bag of facts" is usually a code smell.
|
|
- Use realistic hardware specs — check `mlsysbook/constants.py` and the
|
|
`vault/schema/models.yaml` registry for the canonical values.
|
|
- Paper-cite URL format only: `https://mlsysbook.ai/book/chapters/<slug>`.
|
|
- Scenarios are plaintext; solutions and napkin math can use restricted
|
|
Markdown + KaTeX.
|
|
|
|
---
|
|
|
|
## Things that block external PRs from merging
|
|
|
|
1. **Provenance lie** — `vault mark-exemplar` or `vault promote --reviewed-by`
|
|
fields that don't match git committer.
|
|
2. **Registry mutation** — any commit that deletes lines from
|
|
`id-registry.yaml`. The registry is append-only.
|
|
3. **Schema mixing** — questions at different `schema_version` in the same PR.
|
|
4. **Unsigned schema-evolution PR** — schema bumps require two maintainer
|
|
approvals.
|
|
|
|
---
|
|
|
|
## Phase-by-phase scope
|
|
|
|
As of Phase 0, the vault pipeline is scaffolded but not operational end-to-end.
|
|
What works today:
|
|
|
|
- `vault --version`, `vault --help`.
|
|
- `pip install -e vault-cli/[dev]` and `pytest`.
|
|
- Documentation (ARCHITECTURE.md, REVIEWS.md, TESTING.md, EVOLUTION.md, this file).
|
|
|
|
What's coming per Phase (see [`vault/ARCHITECTURE.md`](vault/ARCHITECTURE.md) §14):
|
|
|
|
- **Phase 1**: `new`, `edit`, `move`, `rm`, `restore`, `build`, `check`, `serve`, `api`. YAML split lands.
|
|
- **Phase 2**: `publish` + primitives, paper-exporter rewrite, rollback-symmetry CI.
|
|
- **Phase 3**: D1 + Worker + `@staffml/vault-types` + FTS5 load-test gate.
|
|
- **Phase 4**: Website cutover + service worker + rollback drill.
|
|
- **Phase 5**: Chain pre-reveal indicator + instrumentation.
|
|
- **Phase 6**: About-page paper prominence.
|
|
|
|
External contributions to `vault/questions/` become feasible at Phase 1 exit.
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
The corpus under `interviews/vault/` is licensed **CC-BY-NC-4.0** — see
|
|
[`vault/questions/LICENSE`](vault/questions/LICENSE). Summary:
|
|
|
|
- **Share and adapt for non-commercial purposes** with attribution.
|
|
- **Commercial use** (training a paid product on the corpus, selling access,
|
|
building paid services around it) requires separate written permission
|
|
from the copyright holders. Contact: `vjreddi@g.harvard.edu`.
|
|
- Contributions to the corpus are assumed to be offered under the same
|
|
CC-BY-NC-4.0 terms. Do not submit content you are not entitled to license
|
|
this way.
|
|
|
|
The `interviews/vault-cli/` tooling is a separate artifact; its license is
|
|
unchanged from the repository's historical state.
|
|
|
|
---
|
|
|
|
## Asking for help
|
|
|
|
- Architecture questions → read [`vault/ARCHITECTURE.md`](vault/ARCHITECTURE.md).
|
|
- "Why was X decided this way?" → check [`vault/REVIEWS.md`](vault/REVIEWS.md).
|
|
Most non-obvious decisions map to a reviewer finding.
|
|
- "Is this bug or intended?" → open an issue with the command you ran and the
|
|
output.
|
|
|
|
---
|
|
|
|
**Thanks for contributing.**
|