Commit Graph

143 Commits

Author SHA1 Message Date
Timothy Jaeryang Baek
4d2f189810 feat: add RAG_RERANKING_BATCH_SIZE configuration option
Add configurable reranker batch size (env var RAG_RERANKING_BATCH_SIZE,
default 32) following the same pattern as RAG_EMBEDDING_BATCH_SIZE.

- config.py: PersistentConfig for RAG_RERANKING_BATCH_SIZE
- main.py: import, state init, pass to get_reranking_function
- colbert.py: accept batch_size param in predict() (was hardcoded 32)
- utils.py: get_reranking_function passes batch_size at call time
- retrieval.py: expose in config GET/POST endpoints and ConfigForm
- Documents.svelte: add Reranking Batch Size input in admin settings

Closes #23730
2026-04-17 08:35:45 +09:00
Timothy Jaeryang Baek
5dae600ce7 chore: format 2026-04-14 17:27:31 -05:00
Classic298
804f9f3153 fix(retrieval): offload sync VECTOR_DB_CLIENT calls in async paths via AsyncVectorDBClient (#23706)
* fix(retrieval): offload sync VECTOR_DB_CLIENT calls in async paths via AsyncVectorDBClient

The vector DB backends (Chroma, pgvector, Qdrant, Milvus, Pinecone,
Weaviate, …) are uniformly synchronous and their methods perform
blocking network or disk I/O. Multiple async route handlers and helpers
were calling them directly on the event loop — file processing,
memories, knowledge bases, hybrid search bookkeeping — so a single
upsert/delete/search would freeze every other in-flight request for the
duration of the call.

Introduce `AsyncVectorDBClient`, a thin async facade that wraps the
existing sync client and dispatches each method through
`asyncio.to_thread`. It mirrors `VectorDBBase` exactly and forwards
*args/**kwargs so backend-specific extra parameters keep working.

Update every async-context call site (routers/retrieval, routers/files,
routers/memories, routers/knowledge, retrieval/utils,
tools/builtin) to await `ASYNC_VECTOR_DB_CLIENT` instead of calling the
sync client directly. Two helpers that were sync-only also acquire
async siblings or are awaited via `asyncio.to_thread` at their async
call site (`remove_knowledge_base_metadata_embedding`,
`get_all_items_from_collections`, `query_doc`).

The original sync `VECTOR_DB_CLIENT` is unchanged, so callers that
already run inside `run_in_threadpool` (e.g. `save_docs_to_vector_db`
and the sync `query_doc`/`get_doc` helpers) are unaffected.

https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8

* fix(retrieval): restore explicit AsyncVectorDBClient signatures matching VectorDBBase

Per PR review: the original *args/**kwargs forwarding lost type
safety and IDE/static-analysis support. Restore explicit signatures
that mirror VectorDBBase exactly, so:

  * Bad kwargs fail at the facade boundary instead of inside the
    worker thread (where the resulting TypeError tends to be
    swallowed by surrounding `try/except`).
  * IDE autocomplete and static analysis work as expected.
  * The stated intent ("mirror VectorDBBase exactly") now holds at
    the API contract level, not just behaviourally.

While doing this, surface a pre-existing bug in
`delete_entries_from_collection` that the stricter typing flagged:
the call passed `metadata={'hash': hash}` which is not a parameter
on `VectorDBBase.delete` nor any backend. The TypeError raised
inside the sync delete was silently swallowed by `except Exception`
so the endpoint always reported `{'status': False}` for every
request instead of actually deleting matching vectors. Replace with
`filter=...` to do what the endpoint name promises.

The thorough review's other note (no concurrency/backpressure on
the shared default threadpool) is intentionally not addressed here:
asyncio.to_thread on the shared executor is the right primitive for
this use case; per-domain bounded executors would add lifecycle
complexity disproportionate to the problem and the loop is no
longer blocked, which was the actual bug.

https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8

* fix(retrieval): parallelize hybrid-search collection prefetch; document async facade contracts

Address PR review findings:

1. Hybrid-search prefetch was sequential
   `query_collection_with_hybrid_search` previously awaited
   `ASYNC_VECTOR_DB_CLIENT.get(name)` once per collection in a for
   loop. Each call already off-loaded to a worker thread, but
   awaiting them serially meant total prefetch latency scaled
   linearly with the number of collections. Run them concurrently
   with `asyncio.gather` so multi-collection queries actually
   benefit from the threadpool. Per-collection exception handling
   is preserved by wrapping each fetch in a small helper that
   logs and returns `(name, None)` on failure, so a single bad
   collection cannot poison the whole gather.

2. Document the thread-safety expectation explicitly
   The facade now formally states what was always implicit: the
   sync `VECTOR_DB_CLIENT` is shared across worker threads, so the
   underlying backend driver must be thread-safe. This is not a
   new exposure — `save_docs_to_vector_db` already called the sync
   client from `run_in_threadpool`. Adding a global lock here
   would defeat the responsiveness the facade exists to provide;
   backends that cannot tolerate concurrent access should grow
   their own internal serialization.

3. Document the API-surface choice and `.sync` escape hatch
   The strict `VectorDBBase` mirror was a deliberate choice (the
   previous `*args/**kwargs` revision let a `metadata=` typo
   silently break an endpoint). Document it, and call out the
   `.sync` escape hatch with an example for callers that genuinely
   need a backend-specific parameter not on `VectorDBBase`.

https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8

* fix(retrieval): guard /delete against null file.hash and let HTTPException reach the client

Address PR review finding on the `metadata=` → `filter=` change in
`delete_entries_from_collection`.

The new `filter={'hash': hash}` query was correct for files that
have a hash, but did not handle `file.hash is None` (unprocessed,
failed, or legacy records). The match semantics of a null filter
value are backend-dependent — some ignore the key entirely, some
treat it as "metadata field absent" and match every such row — so
issuing the query risked deleting unrelated entries.

  * Reject `hash is None` up front with a 400 explaining the file
    has no hash to target.

  * Narrow the surrounding `except Exception` so it no longer
    swallows `HTTPException`. Without this fix the new 400 (and the
    pre-existing 404 for missing files) would be silently re-shaped
    into `{'status': False}` and the caller could not distinguish a
    bad-request input from a backend error.

https://claude.ai/code/session_01JSr4NZSskEUQvoJnavVXh8

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-14 10:50:18 -05:00
Timothy Jaeryang Baek
27169124f2 refac: async db 2026-04-12 14:22:11 -05:00
Timothy Jaeryang Baek
6d736d3c59 refac 2026-03-26 19:01:33 -05:00
Timothy Jaeryang Baek
350d52f515 chore: format 2026-03-25 16:43:06 -05:00
Timothy Jaeryang Baek
8b6fa1f4ab refac 2026-03-24 20:14:28 -05:00
Timothy Jaeryang Baek
d738044f47 refac 2026-03-24 17:03:08 -05:00
Timothy Jaeryang Baek
9a2c60d595 refac 2026-03-21 17:12:33 -05:00
Timothy Jaeryang Baek
de3317e26b refac 2026-03-17 17:58:01 -05:00
Timothy Jaeryang Baek
352391fa76 chore: format 2026-03-08 18:14:09 -05:00
Alvin Tang
2c35bdbcf5 fix: replace bare string raises with proper exception types (#22446)
`raise "string"` in Python raises TypeError instead of the intended
error, making error messages confusing and debugging difficult.

Co-authored-by: gambletan <ethanchang32@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 16:39:09 -05:00
Classic298
0b851cf55a fix: offline model retrieval, re-raise to disable instead of returning useless fallback (#22106)
Co-authored-by: ahxxm <1286225+ahxxm@users.noreply.github.com>
2026-03-01 13:52:31 -05:00
Timothy Jaeryang Baek
d9fd2a3f30 refac 2026-02-22 18:42:25 -06:00
Timothy Jaeryang Baek
631e30e22d refac 2026-02-21 15:35:34 -06:00
Timothy Jaeryang Baek
5d4547f934 enh: RAG_EMBEDDING_CONCURRENT_REQUESTS 2026-02-21 14:33:48 -06:00
VasilyLebedev123
6d67ac371d fix: correct unpacking order of distances, documents, and metadatas in hybrid search query (#21562)
Co-authored-by: Vasily Lebedev <Vasily.Lebedev@sapowernetworks.com.au>
2026-02-19 16:38:40 -06:00
Timothy Jaeryang Baek
f376d4f378 chore: format 2026-02-11 16:24:11 -06:00
Timothy Jaeryang Baek
cd31b8301b refac 2026-02-10 12:44:31 -06:00
Timothy Jaeryang Baek
9747b07ca5 refac 2026-02-08 21:24:38 -06:00
Timothy Jaeryang Baek
e67891a374 refac 2026-01-08 00:42:29 +04:00
Timothy Jaeryang Baek
e4a5b06ca6 enh: embedding_batch_size for local embedding engine 2026-01-01 16:06:42 +04:00
Timothy Jaeryang Baek
dfc5dad631 enh: REQUESTS_VERIFY 2026-01-01 01:27:07 +04:00
okamototk
37085ed42b chore: update langchain 1.2.0 (#19991)
* chore: update langchain 1.2.0

* chore: format
2025-12-20 08:50:44 -05:00
Classic298
2e7c7d635d fix: prevent ExternalReranker from blocking event loop during RAG queries (#20049)
* fix: prevent ExternalReranker from blocking event loop during RAG queries (#120)

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>
Fixes #19900

* Merge pull request open-webui#19030 from open-webui/dev (#122)

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>
Fixes #19900

---------

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-20 08:43:40 -05:00
Classic298
823b9a6dd9 chore/perf: Remove old SRC level log env vars with no impact (#20045)
* Update openai.py

* Update env.py

* Merge pull request open-webui#19030 from open-webui/dev (#119)

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-20 08:16:14 -05:00
Timothy Jaeryang Baek
d19023288e feat/enh: kb files db migration 2025-12-02 10:53:32 -05:00
Timothy Jaeryang Baek
2328dc284e feat/enh: async embedding processing setting
Co-Authored-By: Classic298 <27028174+Classic298@users.noreply.github.com>
2025-11-25 01:55:43 -05:00
Timothy Jaeryang Baek
06f0bfd9f5 fix 2025-11-24 05:58:22 -05:00
Timothy Jaeryang Baek
662a1fac47 fix: hybrid search 2025-11-24 05:52:18 -05:00
Timothy Jaeryang Baek
48d1e67e79 chore: format 2025-11-23 20:15:52 -05:00
Classic298
60dbde7e19 chore (#19389) 2025-11-23 04:40:05 -05:00
Timothy Jaeryang Baek
f9c96d03ad refac 2025-11-22 22:57:27 -05:00
Timothy Jaeryang Baek
9bfc414d26 refac 2025-11-22 21:33:14 -05:00
Classic298
902c6cfbea perf: 50x performance improvement for external embeddings (#19296)
* Update utils.py (#77)

Co-authored-by: Claude <noreply@anthropic.com>

* refactor: address code review feedback for embedding performance improvements (#92)

Co-authored-by: Claude <noreply@anthropic.com>

* fix: prevent sentence transformers from blocking async event loop (#95)

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-22 20:54:59 -05:00
Jacob Leksan
07ef295a77 feat: Adding file metadata to hybrid search (#19095)
* Added metadata to hybrid search

* And config and env plus refac

* consistency

---------

Co-authored-by: Tim Baek <tim@openwebui.com>
2025-11-18 15:29:07 -05:00
Timothy Jaeryang Baek
bc739de024 refac: rerank 2025-11-09 21:33:50 -05:00
krishna-medapati
684324ae9e fix: Handle AttributeError in hybrid search with reranking (#17046)
- Split attribute existence checks from document content checks
- Added hasattr() check for metadatas attribute
- Prevents AttributeError when collection_result is missing attributes
- Maintains all original validation logic

Fixes #17046
2025-11-07 23:31:11 +05:30
Wang Weixuan
5e17882488 fix: use trusted env in web search loader
Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>
2025-10-28 04:58:00 +08:00
Timothy Jaeryang Baek
8197844ff7 refac 2025-10-26 17:22:23 -07:00
Timothy Jaeryang Baek
4e763e8aa8 refac 2025-10-09 16:16:24 -05:00
Timothy Jaeryang Baek
b98d8aa8ec refac 2025-10-07 07:31:06 -05:00
Timothy Jaeryang Baek
a2a2bafdf6 enh/refac: url input handling 2025-10-04 02:02:26 -05:00
Timothy Jaeryang Baek
6e4a2f18e1 refac 2025-09-21 00:14:43 -04:00
Timothy Jaeryang Baek
a51f0c30ec refac/fix: knowledge permission 2025-09-15 11:40:31 -05:00
Timothy Jaeryang Baek
e61e7434a0 refac 2025-09-14 10:46:49 +02:00
Timothy Jaeryang Baek
1ef8204359 refac 2025-09-14 10:45:52 +02:00
Timothy Jaeryang Baek
58d7ca35e3 refac 2025-09-14 10:27:07 +02:00
Timothy Jaeryang Baek
aa8ab349ed feat: ref chat 2025-09-14 10:26:46 +02:00
Timothy Jaeryang Baek
210197fd43 refac/fix: web/youtube file attachment handling 2025-09-13 00:02:48 +04:00