- Return 302 to /static/favicon.png instead of streaming the same PNG per
model id so browsers can cache one asset for default avatars.
- Validate stored /static/ paths with decode, normpath, and /static
prefix checks; invalid paths fall back to favicon.
Made-with: Cursor
* perf(channels): batch user lookup in model_response_handler thread history
The thread-history builder in model_response_handler called
Users.get_user_by_id once per thread message (deduped via an intra-loop
dict), producing N individual SELECTs for a thread of N unique authors.
Replace with a single Users.get_users_by_user_ids call that returns all
authors in one WHERE id IN (...) query, matching the batch pattern
already used elsewhere in this file (lines 739, 804, 1320).
Behavior is preserved: deleted users still resolve to None and fall
through to the existing 'Unknown' fallback via .get().
* refac(channels): rename loop vars to full words per review
Address reviewer feedback to use descriptive names `message` and `user`
instead of single-letter `m` and `u` in the batch user-lookup
comprehensions.
---------
Co-authored-by: Claude <noreply@anthropic.com>
After set_access_grants, the handler was reloading the same knowledge
record via get_knowledge_by_id, which triggers an extra SELECT plus a
nested fetch of access grants. set_access_grants already returns the
newly-written grants and the local knowledge object is otherwise
unchanged, so update it in place and reuse it for the response.
https://claude.ai/code/session_01S18Lgqbih7Ry2JZUUv8TxF
Co-authored-by: Claude <noreply@anthropic.com>
Pass the request-scoped AsyncSession into Models.get_model_by_id so the
endpoint no longer opens a fresh DB session on every call, avoiding an
extra connection acquisition per profile image request.
Co-authored-by: Claude <noreply@anthropic.com>
* perf(users): drop redundant get_user_by_id refetch in session-user endpoints
Five /user/* handlers refetched the user row via Users.get_user_by_id(user.id)
immediately after receiving an identical UserModel from Depends(get_verified_user).
Since get_verified_user already populated the user within the same request
microseconds earlier, the refetch is pure overhead. The dead else branches
(unreachable — get_verified_user raises 401 on missing user) are removed as
a natural consequence.
Affected endpoints:
- GET /user/settings
- GET /user/status
- POST /user/status/update
- GET /user/info
- POST /user/info/update
Eliminates one SELECT per request to each of these endpoints with no behavioral
change.
* fix(users): preserve USER_NOT_FOUND error on status update failure
update_user_status_by_id returns None when the target user is missing or
the update raises. The previous commit removed the pre-update existence
gate (get_user_by_id) and returned the update result directly, which
turned not-found/failure cases into 200 OK with a null body instead of
the expected 400 USER_NOT_FOUND.
Guard the update result explicitly to preserve the original API contract,
matching the equivalent pattern already applied in /user/info/update.
* docs(users): note lost-update tradeoff on /user/info/update
Make the concurrency tradeoff explicit: merging against the auth-time
snapshot slightly widens the lost-update window compared to the previous
pre-merge refetch, but the refetch only narrowed (did not eliminate) that
window. Real safety requires row locking or a version column.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Add configurable reranker batch size (env var RAG_RERANKING_BATCH_SIZE,
default 32) following the same pattern as RAG_EMBEDDING_BATCH_SIZE.
- config.py: PersistentConfig for RAG_RERANKING_BATCH_SIZE
- main.py: import, state init, pass to get_reranking_function
- colbert.py: accept batch_size param in predict() (was hardcoded 32)
- utils.py: get_reranking_function passes batch_size at call time
- retrieval.py: expose in config GET/POST endpoints and ConfigForm
- Documents.svelte: add Reranking Batch Size input in admin settings
Closes#23730
Loader.load() dispatches to the underlying langchain document loaders
(PyMuPDF, Unstructured, python-docx, Tika, …) which are all
synchronous and CPU/IO-bound. process_file() awaited it directly on
the event loop, so parsing a non-trivial PDF/DOCX would freeze the
entire FastAPI app for the duration of the parse — which is what users
experience as "the server hangs whenever I upload a file."
Add an `aload()` async wrapper on Loader that runs the sync load on a
worker thread via asyncio.to_thread, and update process_file() to
await it. The sync API is preserved so existing callers that already
run inside run_in_threadpool (e.g. save_docs_to_vector_db) are
unaffected.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
Co-authored-by: Claude <noreply@anthropic.com>
* fix(retrieval): offload sync VECTOR_DB_CLIENT calls in async paths via AsyncVectorDBClient
The vector DB backends (Chroma, pgvector, Qdrant, Milvus, Pinecone,
Weaviate, …) are uniformly synchronous and their methods perform
blocking network or disk I/O. Multiple async route handlers and helpers
were calling them directly on the event loop — file processing,
memories, knowledge bases, hybrid search bookkeeping — so a single
upsert/delete/search would freeze every other in-flight request for the
duration of the call.
Introduce `AsyncVectorDBClient`, a thin async facade that wraps the
existing sync client and dispatches each method through
`asyncio.to_thread`. It mirrors `VectorDBBase` exactly and forwards
*args/**kwargs so backend-specific extra parameters keep working.
Update every async-context call site (routers/retrieval, routers/files,
routers/memories, routers/knowledge, retrieval/utils,
tools/builtin) to await `ASYNC_VECTOR_DB_CLIENT` instead of calling the
sync client directly. Two helpers that were sync-only also acquire
async siblings or are awaited via `asyncio.to_thread` at their async
call site (`remove_knowledge_base_metadata_embedding`,
`get_all_items_from_collections`, `query_doc`).
The original sync `VECTOR_DB_CLIENT` is unchanged, so callers that
already run inside `run_in_threadpool` (e.g. `save_docs_to_vector_db`
and the sync `query_doc`/`get_doc` helpers) are unaffected.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
* fix(retrieval): restore explicit AsyncVectorDBClient signatures matching VectorDBBase
Per PR review: the original *args/**kwargs forwarding lost type
safety and IDE/static-analysis support. Restore explicit signatures
that mirror VectorDBBase exactly, so:
* Bad kwargs fail at the facade boundary instead of inside the
worker thread (where the resulting TypeError tends to be
swallowed by surrounding `try/except`).
* IDE autocomplete and static analysis work as expected.
* The stated intent ("mirror VectorDBBase exactly") now holds at
the API contract level, not just behaviourally.
While doing this, surface a pre-existing bug in
`delete_entries_from_collection` that the stricter typing flagged:
the call passed `metadata={'hash': hash}` which is not a parameter
on `VectorDBBase.delete` nor any backend. The TypeError raised
inside the sync delete was silently swallowed by `except Exception`
so the endpoint always reported `{'status': False}` for every
request instead of actually deleting matching vectors. Replace with
`filter=...` to do what the endpoint name promises.
The thorough review's other note (no concurrency/backpressure on
the shared default threadpool) is intentionally not addressed here:
asyncio.to_thread on the shared executor is the right primitive for
this use case; per-domain bounded executors would add lifecycle
complexity disproportionate to the problem and the loop is no
longer blocked, which was the actual bug.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
* fix(retrieval): parallelize hybrid-search collection prefetch; document async facade contracts
Address PR review findings:
1. Hybrid-search prefetch was sequential
`query_collection_with_hybrid_search` previously awaited
`ASYNC_VECTOR_DB_CLIENT.get(name)` once per collection in a for
loop. Each call already off-loaded to a worker thread, but
awaiting them serially meant total prefetch latency scaled
linearly with the number of collections. Run them concurrently
with `asyncio.gather` so multi-collection queries actually
benefit from the threadpool. Per-collection exception handling
is preserved by wrapping each fetch in a small helper that
logs and returns `(name, None)` on failure, so a single bad
collection cannot poison the whole gather.
2. Document the thread-safety expectation explicitly
The facade now formally states what was always implicit: the
sync `VECTOR_DB_CLIENT` is shared across worker threads, so the
underlying backend driver must be thread-safe. This is not a
new exposure — `save_docs_to_vector_db` already called the sync
client from `run_in_threadpool`. Adding a global lock here
would defeat the responsiveness the facade exists to provide;
backends that cannot tolerate concurrent access should grow
their own internal serialization.
3. Document the API-surface choice and `.sync` escape hatch
The strict `VectorDBBase` mirror was a deliberate choice (the
previous `*args/**kwargs` revision let a `metadata=` typo
silently break an endpoint). Document it, and call out the
`.sync` escape hatch with an example for callers that genuinely
need a backend-specific parameter not on `VectorDBBase`.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
* fix(retrieval): guard /delete against null file.hash and let HTTPException reach the client
Address PR review finding on the `metadata=` → `filter=` change in
`delete_entries_from_collection`.
The new `filter={'hash': hash}` query was correct for files that
have a hash, but did not handle `file.hash is None` (unprocessed,
failed, or legacy records). The match semantics of a null filter
value are backend-dependent — some ignore the key entirely, some
treat it as "metadata field absent" and match every such row — so
issuing the query risked deleting unrelated entries.
* Reject `hash is None` up front with a 400 explaining the file
has no hash to target.
* Narrow the surrounding `except Exception` so it no longer
swallows `HTTPException`. Without this fix the new 400 (and the
pre-existing 404 for missing files) would be silently re-shaped
into `{'status': False}` and the caller could not distinguish a
bad-request input from a backend error.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
---------
Co-authored-by: Claude <noreply@anthropic.com>