mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 18:18:42 -05:00
Tiny edge function (~430 lines, single file) that powers the Ask
Interviewer panel. Default provider is Cloudflare Workers AI
(Llama 3.1 8B) — always available on the free Workers plan, no API
key required. Adding any of five other providers is a one-line
wrangler secret command:
Groq → wrangler secret put GROQ_API_KEY (Llama 3.1 70B)
OpenAI → wrangler secret put OPENAI_API_KEY (GPT-4o mini)
Anthropic → wrangler secret put ANTHROPIC_API_KEY (Claude 3.5 Haiku)
Gemini → wrangler secret put GEMINI_API_KEY (Gemini 1.5 Flash)
OpenRouter → wrangler secret put OPENROUTER_API_KEY (any model)
The first available provider in the priority chain wins; falls back to
the next on error or rate limit. Each adapter is a config object with
vendorLabel, modelLabel, and privacyNote that the client renders as
inline attribution — so vendor sponsorship is automatic.
Architecture:
* Adapter pattern over 4 request shapes: openai-compat (covers OpenAI,
Groq, OpenRouter, Together, Fireworks, Ollama, LM Studio, vLLM),
anthropic, gemini, cf-workers-ai
* Server-side Socratic system prompt enforced — clients can't bypass
* KV-backed rate limiter (10/IP/hour, 60/IP/day, 8000/global/day,
all configurable via env vars)
* Configurable PROVIDER_PRIORITY env var for sponsor-first ordering
Hardening (Gemini security review):
* Per-turn history content length cap (1000 chars) to prevent OOM /
token waste from oversized payloads
* Request body Content-Length check (16 KB ceiling) before JSON parse
* AbortSignal.timeout(10000) on every upstream fetch via timedFetch()
* Promise.race timeout for the Cloudflare Workers AI binding
* Anthropic and Gemini adapters coalesce consecutive same-role messages
to satisfy strict alternation requirements (this would have caused a
guaranteed 400 on every multi-turn request without the fix)
* Prompt injection mitigation: client-supplied "interviewer" history
turns are stripped server-side; only user clarifications forwarded,
collapsed into a single context block (an attacker controlling the
request body cannot inject fake assistant turns)
* parseIntOrDefault helper guards rate-limit env vars against NaN
fail-open
* Content-Type enforcement (returns 415 if not application/json)
* safeJson<T>() helper handles malformed provider responses cleanly
Files:
* worker/wrangler.toml — bindings, KV namespace, observability
* worker/src/index.ts — the whole worker (one file)
* worker/package.json — wrangler + types
* worker/tsconfig.json — strict TS, workers-types
* worker/README.md — quick reference
* WORKER_DEPLOY.md — full deploy guide with provider upgrades
and vendor sponsorship instructions
Worker typechecks clean (exit 0).
17 lines
388 B
JSON
17 lines
388 B
JSON
{
|
|
"compilerOptions": {
|
|
"target": "ES2022",
|
|
"module": "ES2022",
|
|
"moduleResolution": "Bundler",
|
|
"strict": true,
|
|
"esModuleInterop": true,
|
|
"skipLibCheck": true,
|
|
"forceConsistentCasingInFileNames": true,
|
|
"noEmit": true,
|
|
"isolatedModules": true,
|
|
"resolveJsonModule": true,
|
|
"types": ["@cloudflare/workers-types"]
|
|
},
|
|
"include": ["src/**/*.ts"]
|
|
}
|