mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 19:38:46 -05:00
[GH-ISSUE #18133] issue: 0.6.33 - Focused Retrieval mode doesn't work #18507
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @frenzybiscuit on GitHub (Oct 8, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/18133
Originally assigned to: @tjbck on GitHub.
Check Existing Issues
Installation Method
Pip Install
Open WebUI Version
0.6.33
Ollama Version (if applicable)
No response
Operating System
Debian 12
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
Works as expected
Actual Behavior
Focused Retrieval mode doesn't work. Instead, it will use full context on all files in a knowledgebase.
I can CONFIRM:
A) URL's load fine (which iirc requires rag to work)
B) Files get uploaded fine to the knowledgebase
C) The RAG backend gets hit when the files get uploaded and it works.
This happens WITH and WITHOUT a reranker
Steps to Reproduce
Install LiteLLM + VLLM and use bge-large-en-v1.5 for the embedding model.
Set the documents to the following (Again, it happens WITHOUT the reranker and WITH). Top-K is set to 10.
When you insert a knowledgebase into the chat, it will retrieve -all- documents in full context mode and use 20k+ context in chat.
Logs & Screenshots
See above for screenshot
Additional Information
Also large knowledgebases (700+ documents) don't work at all on retrieval.
@frenzybiscuit commented on GitHub (Oct 8, 2025):
Also i've removed litellm from the equation and have the same issue
@REDWGioBrusca commented on GitHub (Oct 8, 2025):
I'm also having this issue. Downgrading back to 0.6.22 fixes it. I wasn't getting any errors in my logs either, it just wasn't running the focused retrieval.
@tjbck commented on GitHub (Oct 8, 2025):
@silentoplayz could you confirm here?
@silentoplayz commented on GitHub (Oct 8, 2025):
@frenzybiscuit Could you share a fuller screenshot of your
Documentsettings for Open WebUI with LiteLLM removed from the equation?Edit: Note that I do not use VLLM or LiteLLM myself, so my chances of reproducing this issue may be slim to none.
@frenzybiscuit commented on GitHub (Oct 8, 2025):
This is what it looks like when I use VLLM directly for embedding (reranker is irrelevant, as the issue happens with/without):
@frenzybiscuit commented on GitHub (Oct 8, 2025):
This is what it looks like on VLLMs end (yes the ip changed in this screenshot) during file upload in the knowledgebase.
There is -no activity- on VLLM when retrieving from knowledgebase.
@silentoplayz commented on GitHub (Oct 8, 2025):
I believe I may have reproduced the issue without much work? I need your confirmation on this though. I would assume it shouldn't be retrieving all 115 documents in the knowledge collection if
Full Context Modeis toggled off in theDocumentsadmin settings for RAG. @frenzybiscuit@frenzybiscuit commented on GitHub (Oct 8, 2025):
I made a video, but can't upload it. If you are on discord, ping me and I'll send it to you.
Basically the problem is this:
Using RAG retrieval does not work
It forces full context mode (despite not being selected) and loads -all- the knowledgebase into context, regardless on top k in the settings.
The embedding server doesn't physically receive any requests from OWUI when this happens.
@frenzybiscuit commented on GitHub (Oct 8, 2025):
I'm aware this is being looked into, but wanted to share the embedding command for llamacpp so you guys can test it (since most of you likely use ollama) - this worked on the last OWUI version. It has the same problem now.
./llama.cpp*/build/bin/llama-server --host iphere --port 5000 -m ~/models/bge-large-en-v1.5.f16.gguf --embedding --pooling cls -ub 8192 --no-mmap --flash-attn on --api-key here --cont-batching --parallel 5@PawelAnt commented on GitHub (Oct 8, 2025):
I confirm this, when even one small document is loaded into the qdrant knowledge base, the question exceeds the context window with the prompt itself
@theepicsaxguy commented on GitHub (Oct 8, 2025):
Same exact issue for me. it seems to force full context mode even thought that is disabled.
@Classic298 commented on GitHub (Oct 8, 2025):
Can reproduce
@dotmobo commented on GitHub (Oct 8, 2025):
+1 same problem for me with Qdrant, Hybrid Search and external embedding and reranking engine
@AbdullahMPrograms commented on GitHub (Oct 8, 2025):
A similar issue exists for web search; search no longer queries results, instead returning a large amount of tokens (50k+) to the model. In 0.6.32, with the exact same settings, it would return ~5k tokens.
0.6.33:

(47.8k tokens in prompt)
0.6.32:

(6.2k tokens in prompt)
This is a regeneration of the same search prompt with the same settings across versions, but the behaviour is consistent with all web searches.
llama-server command:
./llama-server -m /home/victis/LLM/Models/unsloth/embeddinggemma-300m-GGUF/embeddinggemma-300M-Q8_0.gguf --embeddings -c 2048 -ngl 999 --flash-attn on
Document settings:

@Matten1887 commented on GitHub (Oct 8, 2025):
I run in this issue also since the last update.
@cableman commented on GitHub (Oct 8, 2025):
Using qdrant as vector store it loads all documents in the models knowledge base regardless of the "question" asked! I did not do that in 0.6.32
@jamesottera commented on GitHub (Oct 9, 2025):
Having this same issue with Postgres Vector DB. This is causing HUGE context increase and massive cost growth.
@arruga commented on GitHub (Oct 9, 2025):
Same issue here with v0.6.33 and Ubuntu 22.04. It didn't happen with v0.6.32. I have 3674 source files in 4 knowledge collections. When I ask something in the chat, the 3674 sources (there is a temporary message indicating sources_other) are recovered. I'm working with the default ChromaDB. This happens with and without hybrid search and with "full context mode" off, Top-k = 10.
@mahenning commented on GitHub (Oct 9, 2025):
I PR'ed a fix: https://github.com/open-webui/open-webui/pull/18182.
@alpilotx commented on GitHub (Oct 9, 2025):
Indeed seems to fix it (just quickly applied the changed file in test env, and now retrieval seems to be back to "normal" again, not returning all files)
@jamesottera commented on GitHub (Oct 9, 2025):
Given the seriousness of this issue causing huge context increases that have a large financial impact, can this be merged and rolled out on a release today?
Otherwise, is there any breaking change in 0.6.33 that would cause issues with databases if we rolled our servers back to 0.6.32? I was hesitant to do that due to any possible migrations that may have ran.
@dotmobo commented on GitHub (Oct 9, 2025):
@jamesottera for info, i rolled back to 0.6.32 and i don't have any issues.
@silentoplayz commented on GitHub (Oct 9, 2025):
Testing is wanted on the
devbranch to see if the issue reported here has been solved or not!c4832fdb70@Classic298 commented on GitHub (Oct 9, 2025):
Might be fixed on dev >
c4832fdb70@theepicsaxguy commented on GitHub (Oct 10, 2025):
I have tested ghcr.io/open-webui/open-webui:git-c4832fd-slim and confirm it is fixed.
@silentoplayz commented on GitHub (Oct 10, 2025):
Testing from a contributor internally has also revealed that this issue has most likely been fixed/solved. I'll close this issue, but feel free to add a comment (whoever may see this) if you're still having issues!
@deliciousbob commented on GitHub (Oct 13, 2025):
I've just tested ghcr.io/open-webui/open-webui:git-c4832fd-slim and i can also confirm that Retrieval is working normal again.
Instead of 413 sources it retrieved 10 sources as limited by reranking. Thank you for the fix!
@nlamarque42 commented on GitHub (Oct 13, 2025):
Big issue indeed. We are speedrunning the 1T Tokens of Appreciation OpenAI Trophy with this one.
@jamesottera commented on GitHub (Oct 13, 2025):
Given the severity of this issue from a performance and cost standpoint and being confirmed fixed 3 days ago, can this please be merged to main ASAP? This is a regression error and not a feature request or minor bug.
@ypsilonkah commented on GitHub (Oct 16, 2025):
Switching to ghcr.io/open-webui/open-webui:git-c4832fd-slim reverts all settings changes and chats of the last 14 or so days or is it just on my side here?
@mahenning commented on GitHub (Oct 16, 2025):
As @jamesottera and others already wrote, I urge for a release of this fix as fast as possible. Only make 0.6.34 this fix if you have to, or at least mark the version as yanked on PyPi as e.g. vLLM v0.2.1. I dont understand why a fix was pushed only 1 day later, but it is not on main for over a week.
@Classic298 commented on GitHub (Oct 16, 2025):
I can't answer your questions precisely but 0.6.34 is probably close to releasing.
Either way, if you struggle with this issue, downgrading is a valid option.
@czar commented on GitHub (Oct 16, 2025):
I'm using Watchtower for updates (using docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui) and now I'm trying to downgrade, but I'm not quite sure how to do it. Any help with downgrading my open-webui to before this bug was present would be awesome. Thanks!
@mahenning commented on GitHub (Oct 16, 2025):
@czar watchtower only updates the image/container watching the tag change. If you used the
docker runcommand from the main github page, the image used isghcr.io/open-webui/open-webui:main. Change theopen-webui:mainpart to e.g.open-webui:0.6.32to fix the image version on the version one before the latest (0.6.33) and recreate the container.@Classic298 I know that downgrading is a valid option but most of the people using open-webui aren't aware of this issue only after maybe burning through a few 100k tokens with an external API.. I just wanted to voice that I'm unhappy that the fix is already on dev but not published. There were 2+ releases on the sameday in the past.
@jamesottera commented on GitHub (Oct 16, 2025):
Agreed with @mahenning . We are the few that noticed this and been vocal to have it addressed.
There are countless others that are not aware of it and I’m sure there are many users of this repo using docker on main branch with no version pinning.
It should not be assumed that every user is monitoring every release and some may come to a surprise of bills in the thousands of dollars unexpectedly.
This isn’t a trivial bug and one with great financial impact and for some companies using this library may be the type of thing that moves them away from it. This isn’t only being used by home casual users.
Downgrading (if you are aware of the issue) is an option and what my team did but downgrading sometimes is not possible if there were database migrations in an update and would break a deployment.
It is understood there is some kind of release cycle with other changes that need testing but this single commit could be cherry picked.