Three small polish items flagged in the pre-release audit:
1. DROP release_hash pin
The regression guard in staffml-validate-vault.yml compared vault.db's
computed release_hash against a pinned value in
interviews/vault/corpus-equivalence-hash.txt. That pin was load-bearing
when corpus.json was the source of truth (guarded drift between
committed-JSON and computed-from-YAMLs hash), but post-v1.0 the YAMLs
ARE the source of truth and the hash is deterministic from them.
The pin became a circular check that would bounce every YAML-touching
PR unless the contributor remembered to manually bump the hash.
Removed the pin comparison; the step now just runs vault build as a
reproducibility smoke test. Real integrity still comes from vault
check --strict + codegen drift earlier in the same workflow.
Deleted interviews/vault/corpus-equivalence-hash.txt.
2. Hydration SKELETON for scenario
Summary bundle ships scenario: "" and details with empty strings;
useFullQuestion fetches the real content from the worker (~100-300ms
warm, <5s cold). Before this commit the practice + plans pages showed
a visibly empty region for that hydration window, then popped the
scenario in — a text-FOUC.
Added ScenarioSkeleton component (three pulsing bars of approximate
paragraph height, aria-busy) and rendered it when current.scenario is
empty on both practice and plans. Layout no longer jumps when real
text arrives.
3. CLIENT-SIDE ERROR REPORTER
Silent production regressions (like the getQuestionFullDetail shape
mismatch in PR #1440) were only discoverable when a user said
'getting an error'. Added a lightweight error reporter that hooks
window.error + unhandledrejection, scrubs email patterns, rate-limits
to 20 unique reports per tab, and pipes into the existing analytics
worker as 'client_error' events. No new vendor dependency — reuses
analytics-worker KV storage.
Worker allowlist extended: adds 'client_error' event type + larger
8 KiB per-event cap to fit stack traces + 'message/stack/url/
userAgent' to the allowed-fields list.
Installed from Providers.tsx at app mount.
Build verified green.
* Fixed race condition in KV storage causing data loss under concurrent POSTs
* Secured GET /summary endpoint with ADMIN_SECRET auth header
* Added userLevel, industryRole, and yearsExperience to telemetry schema for Item Response Theory (IRT) validation
* Re-balanced vendor representation in paper examples (added AMD MI300X and Intel Gaudi 3)
Fix the feedback data round-trip end-to-end:
- QuestionFeedback: dedup guard, aria-pressed, hydrate previous
feedback on mount, wire Report/Suggest to analytics events
- analytics.ts: computeSummary() aggregates thumbs and difficulty
with last-write-wins dedup per question+session
- dashboard: new thumbs ratio and difficulty distribution panels
- gauntlet: add QuestionFeedback to per-question review
- progress.ts: include analytics in export/import
- worker.js: server-side summary aggregates feedback with dedup
Add Vitest test infrastructure (34 journey tests across 2 files)
and embed type-check + test steps in both CI deploy workflows
so tests gate every build before deployment.
Systematic audit found 13 gaps in analytics coverage. Now tracking:
Session signals:
- session_start with isReturning flag + screenWidth
- search_query with result counts (debounced 1s)
Content quality signals:
- questionId in question_scored (enables per-question IRT)
- hadUserAnswer flag on answer_revealed (reveal-without-typing rate)
- hadUserAnswer on answer_response_time
- star_gate_shown / star_gate_verified (gate drop-off measurement)
Feature usage:
- gauntlet_completed with pct score (was defined but never wired)
- search tracking (what users look for = gold for content gaps)
Worker updated with new event types + allowed fields.
- Deploy analytics worker to mlsysbook.ai/api/staffml-analytics
- KV namespace: bf81298013404118beab61f55afe1d7d
- Add NEXT_PUBLIC_ANALYTICS_URL to both CI workflows
- Events batched client-side every 30s, flushed on page unload
- Worker validates events, strips PII, stores with 90-day TTL
- CORS restricted to mlsysbook.ai, harvard-edge.github.io, localhost
Analytics now captures:
- answer_response_time: seconds spent before revealing, with napkin grade
- question_thumbs: binary quality signal (up/down)
- question_difficulty_feedback: perceived difficulty vs assigned level
- question_contributed: in-app contribution tracking
These signals enable empirical difficulty calibration (IRT) when
aggregated across users. Response time is a more objective difficulty
proxy than self-assessed scores.