7 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
7700726de2 chore(staffml): release polish — drop hash pin, skeletons, error reporting
Three small polish items flagged in the pre-release audit:

1. DROP release_hash pin
   The regression guard in staffml-validate-vault.yml compared vault.db's
   computed release_hash against a pinned value in
   interviews/vault/corpus-equivalence-hash.txt. That pin was load-bearing
   when corpus.json was the source of truth (guarded drift between
   committed-JSON and computed-from-YAMLs hash), but post-v1.0 the YAMLs
   ARE the source of truth and the hash is deterministic from them.
   The pin became a circular check that would bounce every YAML-touching
   PR unless the contributor remembered to manually bump the hash.
   Removed the pin comparison; the step now just runs vault build as a
   reproducibility smoke test. Real integrity still comes from vault
   check --strict + codegen drift earlier in the same workflow.
   Deleted interviews/vault/corpus-equivalence-hash.txt.

2. Hydration SKELETON for scenario
   Summary bundle ships scenario: "" and details with empty strings;
   useFullQuestion fetches the real content from the worker (~100-300ms
   warm, <5s cold). Before this commit the practice + plans pages showed
   a visibly empty region for that hydration window, then popped the
   scenario in — a text-FOUC.
   Added ScenarioSkeleton component (three pulsing bars of approximate
   paragraph height, aria-busy) and rendered it when current.scenario is
   empty on both practice and plans. Layout no longer jumps when real
   text arrives.

3. CLIENT-SIDE ERROR REPORTER
   Silent production regressions (like the getQuestionFullDetail shape
   mismatch in PR #1440) were only discoverable when a user said
   'getting an error'. Added a lightweight error reporter that hooks
   window.error + unhandledrejection, scrubs email patterns, rate-limits
   to 20 unique reports per tab, and pipes into the existing analytics
   worker as 'client_error' events. No new vendor dependency — reuses
   analytics-worker KV storage.
   Worker allowlist extended: adds 'client_error' event type + larger
   8 KiB per-event cap to fit stack traces + 'message/stack/url/
   userAgent' to the allowed-fields list.
   Installed from Providers.tsx at app mount.

Build verified green.
2026-04-22 12:17:12 -04:00
Vijay Janapa Reddi
3160a1cee5 feat(analytics): secure StaffML analytics worker and add IRT fields
* Fixed race condition in KV storage causing data loss under concurrent POSTs
* Secured GET /summary endpoint with ADMIN_SECRET auth header
* Added userLevel, industryRole, and yearsExperience to telemetry schema for Item Response Theory (IRT) validation
* Re-balanced vendor representation in paper examples (added AMD MI300X and Intel Gaudi 3)
2026-04-08 18:43:02 -04:00
Vijay Janapa Reddi
5955e4a9e2 feat(staffml): complete feedback pipeline with tests and CI
Fix the feedback data round-trip end-to-end:
- QuestionFeedback: dedup guard, aria-pressed, hydrate previous
  feedback on mount, wire Report/Suggest to analytics events
- analytics.ts: computeSummary() aggregates thumbs and difficulty
  with last-write-wins dedup per question+session
- dashboard: new thumbs ratio and difficulty distribution panels
- gauntlet: add QuestionFeedback to per-question review
- progress.ts: include analytics in export/import
- worker.js: server-side summary aggregates feedback with dedup

Add Vitest test infrastructure (34 journey tests across 2 files)
and embed type-check + test steps in both CI deploy workflows
so tests gate every build before deployment.
2026-04-05 13:19:02 -04:00
Vijay Janapa Reddi
a14d46f223 feat(staffml): comprehensive analytics — close all 13 tracking gaps
Systematic audit found 13 gaps in analytics coverage. Now tracking:

Session signals:
- session_start with isReturning flag + screenWidth
- search_query with result counts (debounced 1s)

Content quality signals:
- questionId in question_scored (enables per-question IRT)
- hadUserAnswer flag on answer_revealed (reveal-without-typing rate)
- hadUserAnswer on answer_response_time
- star_gate_shown / star_gate_verified (gate drop-off measurement)

Feature usage:
- gauntlet_completed with pct score (was defined but never wired)
- search tracking (what users look for = gold for content gaps)

Worker updated with new event types + allowed fields.
2026-04-02 15:15:11 -04:00
Vijay Janapa Reddi
ae278e9b92 feat(staffml): deploy Cloudflare analytics worker + wire CI pipelines
- Deploy analytics worker to mlsysbook.ai/api/staffml-analytics
- KV namespace: bf81298013404118beab61f55afe1d7d
- Add NEXT_PUBLIC_ANALYTICS_URL to both CI workflows
- Events batched client-side every 30s, flushed on page unload
- Worker validates events, strips PII, stores with 90-day TTL
- CORS restricted to mlsysbook.ai, harvard-edge.github.io, localhost
2026-04-02 15:15:10 -04:00
Vijay Janapa Reddi
3340f5e977 feat(staffml): add response time + napkin grade tracking for IRT calibration
Analytics now captures:
- answer_response_time: seconds spent before revealing, with napkin grade
- question_thumbs: binary quality signal (up/down)
- question_difficulty_feedback: perceived difficulty vs assigned level
- question_contributed: in-app contribution tracking

These signals enable empirical difficulty calibration (IRT) when
aggregated across users. Response time is a more objective difficulty
proxy than self-assessed scores.
2026-04-02 07:20:00 -04:00
Vijay Janapa Reddi
098f872821 feat(staffml): 8,891 Qs + backward design + math verification + A100 fix
Corpus: 8,891 published (87.8% validated). Backward design methodology.
A100 constants fixed (FP16: 156→312 TFLOPS). Math verification done.
New figures: backward design chain, applicability matrix. Bibliography
updated (Wiggins, Messick). Verification script added.
2026-04-01 23:53:38 -04:00