[GH-ISSUE #17332] issue: Hybrid scroll loops + reranker overrides system prompt (0.6.25 OK, latest very slow) #18244

Closed
opened 2026-04-20 00:27:01 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @galvanoid on GitHub (Sep 10, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/17332

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.6.27

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24.04

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Expected behavior (from v0.6.25)

With BM25 weighting at pure lexical / “pure BM25”):

Retrieval completes in a single round with /points/query to Qdrant.

End-to-end latency < 4 minutes on the given corpus.

System prompt is preserved and the assistant uses the configured name and output structure.

Actual Behavior

Actual behavior (latest)

Even with BM25 weighting at pure lexical (now, top right):

Starting hybrid search for 2 queries... repeatedly

Calls to Qdrant /points/scroll (not /points/query) many times

Uses embeddings anyway (embedding_config ... snowflake-arctic-embed2)

CPU shows 6–8 repeated spikes (heavy cycles) and total latency ~15 minutes.

When reranker is ON (hybrid search toggle) the assistant loses the system prompt (answers claim a different name and ignore formatting).
When reranker is OFF, system prompt is honored again.

Steps to Reproduce

Upgrade to latest Open WebUI

In Retrieval settings:

Set BM25 weight = 1.0 (full lexical, top right).

Disable dense/semantic weighting in the UI, slider fully right (I tried the top left too in case there was a problem with the switches names)

Run a query (example used):
Responsabilidad patrimonial de la administración en derecho español

Observe logs and CPU.

Logs & Screenshots

Latest (regressed) — multiple scroll cycles, hybrid active, reranker contaminates:
https://ibb.co/bgZSH5vV

INFO open_webui.retrieval.utils:query_collection_with_hybrid_search:357 - Starting hybrid search for 2 queries in 1 collections...
INFO httpx._client:_send_single_request - POST http://127.0.0.1:6333/collections/open-webui_knowledge/points/scroll "HTTP/1.1 200 OK"
...
INFO open_webui.retrieval.models.external:predict - ExternalReranker:predict:model qwen3:1.7b
INFO open_webui.retrieval.models.external:predict - ExternalReranker:predict:query responsabilidad patrimonial de la administración en derecho español
...
query_doc_with_hybrid_search:result ... "embedding_config": {"engine":"ollama","model":"snowflake-arctic-embed2:latest"} ...

Note: repeated Starting hybrid search... + /points/scroll loops and embedding_config present → dense path is still active despite “pure BM25”.

v0.6.25 (good) — /points/query, fewer cycles, system prompt preserved:
https://ibb.co/Ng0tSp1z

INFO httpx._client:_send_single_request - POST http://127.0.0.1:6333/collections/open-webui_knowledge/points/query "HTTP/1.1 200 OK"
...
INFO open_webui.retrieval.models.external:predict - ExternalReranker:predict:model qwen3:1.7b
...
query_doc_with_hybrid_search:result ... "embedding_config": {"engine":"ollama","model":"snowflake-arctic-embed2:latest"} ...

Still shows “hybrid” entries, but importantly uses /points/query (fast ANN top-k) and the assistant keeps the system prompt (name + structure ok).

Additional Information

On the latest Open WebUI build, pure BM25 (lexical) retrieval becomes extremely slow (~15 min per query) with 6–8 CPU spikes. In addition, when the reranker is enabled, the assistant loses the system prompt (e.g., “forgets its name” and ignores response structure).

On v0.6.25, the same setup finishes in <4 minutes, shows a single CPU plateau, and respects the system prompt even with the reranker turned on.

Impact

Massive latency increase (≈4 min → ≈15 min)

Multiple heavy CPU rounds (sawtooth profile) instead of one

System prompt not honored when reranker is enabled (answers deviate from required format and assistant name)

Environment

Open WebUI:

Working: v0.6.25

Regressed: latest (as of 2025-09-10)

Vector DB: Qdrant (CPU build), local at http://127.0.0.1:6333

Reranker model (external): qwen3:1.7b

Embedding model observed in logs: snowflake-arctic-embed2

Corpus size: hundreds of thousands of chunks/documents

Hardware: CPU-only for retrieval; GPUs present but unused for Qdrant in this test

Originally created by @galvanoid on GitHub (Sep 10, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/17332 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.6.27 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Expected behavior (from v0.6.25) With BM25 weighting at pure lexical / “pure BM25”): Retrieval completes in a single round with /points/query to Qdrant. End-to-end latency < 4 minutes on the given corpus. System prompt is preserved and the assistant uses the configured name and output structure. ### Actual Behavior Actual behavior (latest) Even with BM25 weighting at pure lexical (now, top right): Starting hybrid search for 2 queries... repeatedly Calls to Qdrant /points/scroll (not /points/query) many times Uses embeddings anyway (embedding_config ... snowflake-arctic-embed2) CPU shows 6–8 repeated spikes (heavy cycles) and total latency ~15 minutes. When reranker is ON (hybrid search toggle) the assistant loses the system prompt (answers claim a different name and ignore formatting). When reranker is OFF, system prompt is honored again. ### Steps to Reproduce Upgrade to latest Open WebUI In Retrieval settings: Set BM25 weight = 1.0 (full lexical, top right). Disable dense/semantic weighting in the UI, slider fully right (I tried the top left too in case there was a problem with the switches names) Run a query (example used): Responsabilidad patrimonial de la administración en derecho español Observe logs and CPU. ### Logs & Screenshots Latest (regressed) — multiple scroll cycles, hybrid active, reranker contaminates: https://ibb.co/bgZSH5vV ``` INFO open_webui.retrieval.utils:query_collection_with_hybrid_search:357 - Starting hybrid search for 2 queries in 1 collections... INFO httpx._client:_send_single_request - POST http://127.0.0.1:6333/collections/open-webui_knowledge/points/scroll "HTTP/1.1 200 OK" ... INFO open_webui.retrieval.models.external:predict - ExternalReranker:predict:model qwen3:1.7b INFO open_webui.retrieval.models.external:predict - ExternalReranker:predict:query responsabilidad patrimonial de la administración en derecho español ... query_doc_with_hybrid_search:result ... "embedding_config": {"engine":"ollama","model":"snowflake-arctic-embed2:latest"} ... ``` Note: repeated Starting hybrid search... + /points/scroll loops and embedding_config present → dense path is still active despite “pure BM25”. v0.6.25 (good) — /points/query, fewer cycles, system prompt preserved: https://ibb.co/Ng0tSp1z ``` INFO httpx._client:_send_single_request - POST http://127.0.0.1:6333/collections/open-webui_knowledge/points/query "HTTP/1.1 200 OK" ... INFO open_webui.retrieval.models.external:predict - ExternalReranker:predict:model qwen3:1.7b ... query_doc_with_hybrid_search:result ... "embedding_config": {"engine":"ollama","model":"snowflake-arctic-embed2:latest"} ... ``` Still shows “hybrid” entries, but importantly uses /points/query (fast ANN top-k) and the assistant keeps the system prompt (name + structure ok). ### Additional Information On the latest Open WebUI build, pure BM25 (lexical) retrieval becomes extremely slow (~15 min per query) with 6–8 CPU spikes. In addition, when the reranker is enabled, the assistant loses the system prompt (e.g., “forgets its name” and ignores response structure). On v0.6.25, the same setup finishes in <4 minutes, shows a single CPU plateau, and respects the system prompt even with the reranker turned on. Impact Massive latency increase (≈4 min → ≈15 min) Multiple heavy CPU rounds (sawtooth profile) instead of one System prompt not honored when reranker is enabled (answers deviate from required format and assistant name) Environment Open WebUI: Working: v0.6.25 Regressed: latest (as of 2025-09-10) Vector DB: Qdrant (CPU build), local at http://127.0.0.1:6333 Reranker model (external): qwen3:1.7b Embedding model observed in logs: snowflake-arctic-embed2 Corpus size: hundreds of thousands of chunks/documents Hardware: CPU-only for retrieval; GPUs present but unused for Qdrant in this test
GiteaMirror added the bug label 2026-04-20 00:27:01 -05:00
Author
Owner

@tjbck commented on GitHub (Sep 10, 2025):

@rgaricano

<!-- gh-comment-id:3274347198 --> @tjbck commented on GitHub (Sep 10, 2025): @rgaricano
Author
Owner

@tjbck commented on GitHub (Sep 10, 2025):

System prompt part being ignored is most likely caused by the retrieved context being much larger than what the model can handle.

<!-- gh-comment-id:3274421422 --> @tjbck commented on GitHub (Sep 10, 2025): System prompt part being ignored is most likely caused by the retrieved context being much larger than what the model can handle.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#18244