mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #17332] issue: Hybrid scroll loops + reranker overrides system prompt (0.6.25 OK, latest very slow) #33773
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @galvanoid on GitHub (Sep 10, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/17332
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.6.27
Ollama Version (if applicable)
No response
Operating System
Ubuntu 24.04
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
Expected behavior (from v0.6.25)
With BM25 weighting at pure lexical / “pure BM25”):
Retrieval completes in a single round with /points/query to Qdrant.
End-to-end latency < 4 minutes on the given corpus.
System prompt is preserved and the assistant uses the configured name and output structure.
Actual Behavior
Actual behavior (latest)
Even with BM25 weighting at pure lexical (now, top right):
Starting hybrid search for 2 queries... repeatedly
Calls to Qdrant /points/scroll (not /points/query) many times
Uses embeddings anyway (embedding_config ... snowflake-arctic-embed2)
CPU shows 6–8 repeated spikes (heavy cycles) and total latency ~15 minutes.
When reranker is ON (hybrid search toggle) the assistant loses the system prompt (answers claim a different name and ignore formatting).
When reranker is OFF, system prompt is honored again.
Steps to Reproduce
Upgrade to latest Open WebUI
In Retrieval settings:
Set BM25 weight = 1.0 (full lexical, top right).
Disable dense/semantic weighting in the UI, slider fully right (I tried the top left too in case there was a problem with the switches names)
Run a query (example used):
Responsabilidad patrimonial de la administración en derecho español
Observe logs and CPU.
Logs & Screenshots
Latest (regressed) — multiple scroll cycles, hybrid active, reranker contaminates:
https://ibb.co/bgZSH5vV
Note: repeated Starting hybrid search... + /points/scroll loops and embedding_config present → dense path is still active despite “pure BM25”.
v0.6.25 (good) — /points/query, fewer cycles, system prompt preserved:
https://ibb.co/Ng0tSp1z
Still shows “hybrid” entries, but importantly uses /points/query (fast ANN top-k) and the assistant keeps the system prompt (name + structure ok).
Additional Information
On the latest Open WebUI build, pure BM25 (lexical) retrieval becomes extremely slow (~15 min per query) with 6–8 CPU spikes. In addition, when the reranker is enabled, the assistant loses the system prompt (e.g., “forgets its name” and ignores response structure).
On v0.6.25, the same setup finishes in <4 minutes, shows a single CPU plateau, and respects the system prompt even with the reranker turned on.
Impact
Massive latency increase (≈4 min → ≈15 min)
Multiple heavy CPU rounds (sawtooth profile) instead of one
System prompt not honored when reranker is enabled (answers deviate from required format and assistant name)
Environment
Open WebUI:
Working: v0.6.25
Regressed: latest (as of 2025-09-10)
Vector DB: Qdrant (CPU build), local at http://127.0.0.1:6333
Reranker model (external): qwen3:1.7b
Embedding model observed in logs: snowflake-arctic-embed2
Corpus size: hundreds of thousands of chunks/documents
Hardware: CPU-only for retrieval; GPUs present but unused for Qdrant in this test
@tjbck commented on GitHub (Sep 10, 2025):
@rgaricano
@tjbck commented on GitHub (Sep 10, 2025):
System prompt part being ignored is most likely caused by the retrieved context being much larger than what the model can handle.