Avoid loading the full Chat row (including the potentially large `chat`
JSON column) just to read tag IDs from `meta.tags`. Issue a narrow
SELECT on `Chat.meta` instead, which is much cheaper for chats with
large message histories.
Co-authored-by: Claude <noreply@anthropic.com>
Convert chat_message_file_ids from list to set so the membership test
in the comprehension is O(1) instead of O(m), turning the dedupe from
O(n*m) into O(n+m). Also replace the redundant set([...]) with a set
comprehension.
https://claude.ai/code/session_01Le3NnqNhcgaJvFrDZGqmwe
Co-authored-by: Claude <noreply@anthropic.com>
Pass the request-scoped AsyncSession into Models.get_model_by_id so the
endpoint no longer opens a fresh DB session on every call, avoiding an
extra connection acquisition per profile image request.
Co-authored-by: Claude <noreply@anthropic.com>
* perf(users): drop redundant get_user_by_id refetch in session-user endpoints
Five /user/* handlers refetched the user row via Users.get_user_by_id(user.id)
immediately after receiving an identical UserModel from Depends(get_verified_user).
Since get_verified_user already populated the user within the same request
microseconds earlier, the refetch is pure overhead. The dead else branches
(unreachable — get_verified_user raises 401 on missing user) are removed as
a natural consequence.
Affected endpoints:
- GET /user/settings
- GET /user/status
- POST /user/status/update
- GET /user/info
- POST /user/info/update
Eliminates one SELECT per request to each of these endpoints with no behavioral
change.
* fix(users): preserve USER_NOT_FOUND error on status update failure
update_user_status_by_id returns None when the target user is missing or
the update raises. The previous commit removed the pre-update existence
gate (get_user_by_id) and returned the update result directly, which
turned not-found/failure cases into 200 OK with a null body instead of
the expected 400 USER_NOT_FOUND.
Guard the update result explicitly to preserve the original API contract,
matching the equivalent pattern already applied in /user/info/update.
* docs(users): note lost-update tradeoff on /user/info/update
Make the concurrency tradeoff explicit: merging against the auth-time
snapshot slightly widens the lost-update window compared to the previous
pre-merge refetch, but the refetch only narrowed (did not eliminate) that
window. Real safety requires row locking or a version column.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Add configurable reranker batch size (env var RAG_RERANKING_BATCH_SIZE,
default 32) following the same pattern as RAG_EMBEDDING_BATCH_SIZE.
- config.py: PersistentConfig for RAG_RERANKING_BATCH_SIZE
- main.py: import, state init, pass to get_reranking_function
- colbert.py: accept batch_size param in predict() (was hardcoded 32)
- utils.py: get_reranking_function passes batch_size at call time
- retrieval.py: expose in config GET/POST endpoints and ConfigForm
- Documents.svelte: add Reranking Batch Size input in admin settings
Closes#23730
Loader.load() dispatches to the underlying langchain document loaders
(PyMuPDF, Unstructured, python-docx, Tika, …) which are all
synchronous and CPU/IO-bound. process_file() awaited it directly on
the event loop, so parsing a non-trivial PDF/DOCX would freeze the
entire FastAPI app for the duration of the parse — which is what users
experience as "the server hangs whenever I upload a file."
Add an `aload()` async wrapper on Loader that runs the sync load on a
worker thread via asyncio.to_thread, and update process_file() to
await it. The sync API is preserved so existing callers that already
run inside run_in_threadpool (e.g. save_docs_to_vector_db) are
unaffected.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
Co-authored-by: Claude <noreply@anthropic.com>
* fix(retrieval): offload sync VECTOR_DB_CLIENT calls in async paths via AsyncVectorDBClient
The vector DB backends (Chroma, pgvector, Qdrant, Milvus, Pinecone,
Weaviate, …) are uniformly synchronous and their methods perform
blocking network or disk I/O. Multiple async route handlers and helpers
were calling them directly on the event loop — file processing,
memories, knowledge bases, hybrid search bookkeeping — so a single
upsert/delete/search would freeze every other in-flight request for the
duration of the call.
Introduce `AsyncVectorDBClient`, a thin async facade that wraps the
existing sync client and dispatches each method through
`asyncio.to_thread`. It mirrors `VectorDBBase` exactly and forwards
*args/**kwargs so backend-specific extra parameters keep working.
Update every async-context call site (routers/retrieval, routers/files,
routers/memories, routers/knowledge, retrieval/utils,
tools/builtin) to await `ASYNC_VECTOR_DB_CLIENT` instead of calling the
sync client directly. Two helpers that were sync-only also acquire
async siblings or are awaited via `asyncio.to_thread` at their async
call site (`remove_knowledge_base_metadata_embedding`,
`get_all_items_from_collections`, `query_doc`).
The original sync `VECTOR_DB_CLIENT` is unchanged, so callers that
already run inside `run_in_threadpool` (e.g. `save_docs_to_vector_db`
and the sync `query_doc`/`get_doc` helpers) are unaffected.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
* fix(retrieval): restore explicit AsyncVectorDBClient signatures matching VectorDBBase
Per PR review: the original *args/**kwargs forwarding lost type
safety and IDE/static-analysis support. Restore explicit signatures
that mirror VectorDBBase exactly, so:
* Bad kwargs fail at the facade boundary instead of inside the
worker thread (where the resulting TypeError tends to be
swallowed by surrounding `try/except`).
* IDE autocomplete and static analysis work as expected.
* The stated intent ("mirror VectorDBBase exactly") now holds at
the API contract level, not just behaviourally.
While doing this, surface a pre-existing bug in
`delete_entries_from_collection` that the stricter typing flagged:
the call passed `metadata={'hash': hash}` which is not a parameter
on `VectorDBBase.delete` nor any backend. The TypeError raised
inside the sync delete was silently swallowed by `except Exception`
so the endpoint always reported `{'status': False}` for every
request instead of actually deleting matching vectors. Replace with
`filter=...` to do what the endpoint name promises.
The thorough review's other note (no concurrency/backpressure on
the shared default threadpool) is intentionally not addressed here:
asyncio.to_thread on the shared executor is the right primitive for
this use case; per-domain bounded executors would add lifecycle
complexity disproportionate to the problem and the loop is no
longer blocked, which was the actual bug.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
* fix(retrieval): parallelize hybrid-search collection prefetch; document async facade contracts
Address PR review findings:
1. Hybrid-search prefetch was sequential
`query_collection_with_hybrid_search` previously awaited
`ASYNC_VECTOR_DB_CLIENT.get(name)` once per collection in a for
loop. Each call already off-loaded to a worker thread, but
awaiting them serially meant total prefetch latency scaled
linearly with the number of collections. Run them concurrently
with `asyncio.gather` so multi-collection queries actually
benefit from the threadpool. Per-collection exception handling
is preserved by wrapping each fetch in a small helper that
logs and returns `(name, None)` on failure, so a single bad
collection cannot poison the whole gather.
2. Document the thread-safety expectation explicitly
The facade now formally states what was always implicit: the
sync `VECTOR_DB_CLIENT` is shared across worker threads, so the
underlying backend driver must be thread-safe. This is not a
new exposure — `save_docs_to_vector_db` already called the sync
client from `run_in_threadpool`. Adding a global lock here
would defeat the responsiveness the facade exists to provide;
backends that cannot tolerate concurrent access should grow
their own internal serialization.
3. Document the API-surface choice and `.sync` escape hatch
The strict `VectorDBBase` mirror was a deliberate choice (the
previous `*args/**kwargs` revision let a `metadata=` typo
silently break an endpoint). Document it, and call out the
`.sync` escape hatch with an example for callers that genuinely
need a backend-specific parameter not on `VectorDBBase`.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
* fix(retrieval): guard /delete against null file.hash and let HTTPException reach the client
Address PR review finding on the `metadata=` → `filter=` change in
`delete_entries_from_collection`.
The new `filter={'hash': hash}` query was correct for files that
have a hash, but did not handle `file.hash is None` (unprocessed,
failed, or legacy records). The match semantics of a null filter
value are backend-dependent — some ignore the key entirely, some
treat it as "metadata field absent" and match every such row — so
issuing the query risked deleting unrelated entries.
* Reject `hash is None` up front with a 400 explaining the file
has no hash to target.
* Narrow the surrounding `except Exception` so it no longer
swallows `HTTPException`. Without this fix the new 400 (and the
pre-existing 404 for missing files) would be silently re-shaped
into `{'status': False}` and the caller could not distinguish a
bad-request input from a backend error.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
---------
Co-authored-by: Claude <noreply@anthropic.com>
* fix(middleware): replace BaseHTTPMiddleware HTTP middlewares with pure ASGI implementations
Starlette's BaseHTTPMiddleware (and the @app.middleware('http')
decorator that uses it) wraps the downstream app in an anyio task
group whose cancel scope tears down the inner task on every exit —
client disconnect, response complete, or any outer middleware bailing.
That CancelledError gets injected into whatever the inner task was
awaiting, so DB queries, embedding calls, and other long awaits get
killed mid-flight. Under aiosqlite the cleanup path then logs a
multi-page `terminate_force_close() not implemented` traceback at
ERROR for every cancelled DB call.
Open WebUI had four such middlewares stacked
(`commit_session_after_request`, `check_url`, `inspect_websocket`,
`RedirectMiddleware`) so a single cancellation would compound through
all four.
Move the four middlewares to a new `open_webui.utils.asgi_middleware`
module as plain ASGI classes (`__call__(scope, receive, send)`):
* `CommitSessionMiddleware` — was `commit_session_after_request`;
now also rolls back if commit fails
before releasing the connection.
* `AuthTokenMiddleware` — was `check_url`; sets request.state
token + enable_api_keys + stamps
X-Process-Time via a wrapped send.
* `WebsocketUpgradeGuardMiddleware`
— was `inspect_websocket`; rejects
/ws/socket.io HTTP requests that
claim transport=websocket without a
proper Upgrade/Connection header.
* `RedirectMiddleware` — was the BaseHTTPMiddleware subclass;
same /watch + share-target rewrites.
Pure ASGI does not introduce a cancel scope around the downstream app,
so client disconnects propagate via `receive()` (the way ASGI was
designed) instead of being injected as CancelledError. Middleware
ordering is preserved.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
* fix(middleware): CommitSessionMiddleware — rollback on downstream error, never commit failed requests
The first cut put commit() in a finally block, which meant that even
when a downstream handler raised, the middleware would still commit
whatever partial sync writes that handler had made before the
failure. That regressed the previous BaseHTTPMiddleware semantics
where commit only ran on the success path.
Restructure the failure handling:
* Downstream raised → rollback any pending sync work, release the
connection, re-raise so the outer error middleware turns it into
an error response. We never commit a request that did not complete.
* Downstream returned → commit. On commit failure, log loudly,
rollback, and re-raise. ScopedSession.remove() always runs in
finally so the connection cannot leak.
Document the inherent pure-ASGI limitation explicitly: by the time
`await self.app(...)` returns the response messages have already
been emitted, so a commit failure can no longer change what the
client sees on the wire. Buffering the response to gate it on commit
success would break streaming responses (chat completions, SSE) which
are core to Open WebUI; the trade-off is intentional. Routes that
need commit-before-send must manage the sync session explicitly.
Also drop unused `typing` imports flagged by review.
https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8
---------
Co-authored-by: Claude <noreply@anthropic.com>
* fix: drop extra='allow' on FolderForm and FolderUpdateForm
These request models were configured to accept arbitrary extra fields,
which were then merged into the folder row via form_data.model_dump().
In insert_new_folder the server-assigned user_id is placed before the
form spread, so a client-supplied user_id in the request body would
override it and the folder would be persisted against another account.
Strictly typed inputs are the correct shape for these endpoints — the
client has no legitimate reason to send fields beyond the declared
ones, and dropping extra='allow' closes the mass-assignment sink at
the validation layer instead of relying on every callsite to merge
fields in the right order.
* fix: reject unknown fields on FolderForm and FolderUpdateForm
Address review feedback: dropping extra='allow' fell back to Pydantic
v2's default extra='ignore', which only silently drops unknown fields
instead of rejecting them. The intent for these request models is a
strict input contract — fail fast when a client sends anything the
server does not expect — so explicitly set extra='forbid'. This also
makes the hardening visible in the form definition rather than implicit
in the default.