mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-08 04:16:03 -05:00
[GH-ISSUE #3129] feat: whole/full document mode #51807
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @PkmX on GitHub (Jun 13, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/3129
Originally assigned to: @tjbck on GitHub.
Is your feature request related to a problem? Please describe.
Sometimes it may be ideal to pass the whole document to LLM for tasks like summarization. Currently, if you upload a document and ask the LLM to summarize it, the RAG would most likely just return no result. As the user can't turn off the retrieval part, the LLM basically becomes useless in this rather common use case. Passing the full document is also useful of tasks like translation or sentiment analysis.
Describe the solution you'd like
When using an uploaded document or fetched webpage, there should be a checkbox allowing the user to pass the entire ingested document as the context. This should be a relative easy change to just skip the retrieval straight to the query.
Ideally, there should be a warning if the content is larger than the LLM's context size, so truncation or degraded output does not occur.
Describe alternatives you've considered
Additional context
This feature is also present in some other chat UI's, e.g., “document pinning” in AnythingLLM.
I'm not sure how this will interact with pipelines, since pipelines can basically do anything. I think such a flag can also be passed to compatible RAG pipelines to let it know if the user wants to perform context retrieval or just do a full-context insertion.
@bunnyfu commented on GitHub (Jun 14, 2024):
Full support, with 32-128k context models now being more common, sometimes it's just easier and more failsafe to pass the whole document into the context if you want a summary.
@Peter-De-Ath commented on GitHub (Jun 14, 2024):
Yes I like this.
I currently open the document then paste it into chat, then do some editing of the messages to get the prompting right for the task i want.
on Summarization I have a modelfile which was setup just for summarizing which i can just paste my entire document (text) and return just the summary
@rvkwi commented on GitHub (Jul 22, 2024):
That would be really useful for several edge cases, recently ran into this not with summaries but some analysis. It was not about the content but patterns in it (so the entire context was important to survive at once). Ended up copying and pasting a lot, and eventually throwing it into Claude because this got tedious quickly.
RAG is amazing, but there's cases where a simple 1:1 is all you need.
@pbasov commented on GitHub (Jul 23, 2024):
With llama3.1 supporting 128k context this would be an important feature to have
@justinh-rahb commented on GitHub (Jul 23, 2024):
I think a lot of people are going to soon realize that Ollama doesn't actually run models at their full context size by default, and that if you wanted to run even 8B at 128K you'd need over 100GB of VRAM...
@rvkwi commented on GitHub (Jul 24, 2024):
just because some can't fit it, doesn't mean others can't. num_ctx is not exactly hidden or advanced to understand, and you don't have to max it out just because it can. on top not everything uses standard quadratic attention to sequence length.
open webui even has it more clearly worded as context length in the model file.
i'm just saying, RAG is great, but there are also plenty of uses where you specifically don't want to cut a context insertion apart. a lot of these cases don't even need a massive context, but it's not unreasonable that people run high contexts with the webui. i think it would be a helpful feature to have as a toggle.
@justinh-rahb commented on GitHub (Jul 24, 2024):
I wasn't arguing against it @rvkwi, you can see that I did thumbs-up the OP. You and I may know how
num_ctxworks full-well, but there are a LOT of WebUI users that don't understand that.@formigarafa commented on GitHub (Jul 30, 2024):
Is it possible to bypass the current RAG system? I mean, having the documents inserted in the chat but opt to not use/apply them through the built-in RAG system.
ATM, the biggest issue is that it is hard to integrate any alternative with a RAG on the way.
I did not check but yet if it is available, but with that in hand and with an api to retrieve/read the documents we could implement alternative methods of context insertion with filters, pipes, tools or whatever we dream.
@justinh-rahb commented on GitHub (Jul 30, 2024):
Example for Tools:
From: https://openwebui.com/t/hub/get_files
@pomoke commented on GitHub (Aug 10, 2024):
If we cannot fit the whole file into the ctx, we will need to take some other approaches like memGPT.
@formigarafa commented on GitHub (Aug 11, 2024):
ATM, what happens the given prompt (and chat history) does not fit the context? I mean, this problem already exists, doesn't it?
@pomoke commented on GitHub (Aug 13, 2024):
However, the default RAG is still seems to be enabled with this and
.citation = True.@NeverOccurs commented on GitHub (Sep 10, 2024):
Try this:
https://github.com/NeverOccurs/full_doc_pipe
Use it as a filter function in Open-Webui. It allows uploading a file and push it fully into system prompt. Then you can direct ask the LLM to summarize it or whatever. It doesn't support changing file midway through the chat though.
Still optimizing it but for now I think it is enough for my use case. I'm trying to incorporating it with Claude's prompt caching to minimize cost.
@tjbck commented on GitHub (Sep 29, 2024):
Full document mode toggle option has been added to latest dev, you can find the toggleable option by clicking on the file item component. Testing wanted here!
@daniel-dona commented on GitHub (Oct 13, 2024):
Looks like the context is not being passed to the LLM with the "full document" option.
The query received by Ollama:
Used document:
BK72XX_SDK_User_Manual-3.0.3-1.pdf
@daniel-dona commented on GitHub (Oct 13, 2024):
Nope, not a single 40X error found in the Developer Console, I don't think is a CORS problem
@a-mart commented on GitHub (Oct 21, 2024):
Seeing the same issue. Only handing 5 random pages of the document to the model out of roughly 50.
@a-mart commented on GitHub (Oct 28, 2024):
I've also confirmed that I'm not seeing the errors from #291 but still experiencing the same issue
@maurizioaiello commented on GitHub (May 1, 2025):
How does it work? Can you explain better how to use it?
@jayendra19 commented on GitHub (Nov 3, 2025):
If i have pdf and i uploaded it when i say summarise it won’t be able to do it how can i enable it ?
@weathon commented on GitHub (Jan 26, 2026):
the full text mode seems to work for me, but is there a way to make it default?
@curious-broccoli commented on GitHub (Jan 27, 2026):
I agree, it works, but it is rather annoying to use. Mainly that you first need to send any message with the uploaded file selected so you can enable the full document mode. Before a message is sent, the file does not show up in the (settings) sidebar. Maybe this is also a bug.
EDIT: apparently you can already click the file while it is shown in the input to toggle the mode, so ignore what I wrote above
@curious-broccoli commented on GitHub (Jan 27, 2026):
@weathon there seems to be, but it might affect more than you wish