feat: whole/full document mode #1245

@Peter-De-Ath commented on GitHub (Jun 14, 2024):

Yes I like this.
I currently open the document then paste it into chat, then do some editing of the messages to get the prompting right for the task i want.

on Summarization I have a modelfile which was setup just for summarizing which i can just paste my entire document (text) and return just the summary

@Peter-De-Ath commented on GitHub (Jun 14, 2024): Yes I like this. I currently open the document then paste it into chat, then do some editing of the messages to get the prompting right for the task i want. on Summarization I have a modelfile which was setup just for summarizing which i can just paste my entire document (text) and return just the summary

GiteaMirror commented

@rvkwi commented on GitHub (Jul 22, 2024):

That would be really useful for several edge cases, recently ran into this not with summaries but some analysis. It was not about the content but patterns in it (so the entire context was important to survive at once). Ended up copying and pasting a lot, and eventually throwing it into Claude because this got tedious quickly.

RAG is amazing, but there's cases where a simple 1:1 is all you need.

@rvkwi commented on GitHub (Jul 22, 2024): That would be really useful for several edge cases, recently ran into this not with summaries but some analysis. It was not about the content but patterns in it (so the entire context was important to survive at once). Ended up copying and pasting a lot, and eventually throwing it into Claude because this got tedious quickly. RAG is amazing, but there's cases where a simple 1:1 is all you need.

GiteaMirror commented

@pbasov commented on GitHub (Jul 23, 2024):

With llama3.1 supporting 128k context this would be an important feature to have

@pbasov commented on GitHub (Jul 23, 2024): With llama3.1 supporting 128k context this would be an important feature to have

GiteaMirror commented

2025-11-11 14:40:59 -06:00

@justinh-rahb commented on GitHub (Jul 23, 2024):

With llama3.1 supporting 128k context this would be an important feature to have

I think a lot of people are going to soon realize that Ollama doesn't actually run models at their full context size by default, and that if you wanted to run even 8B at 128K you'd need over 100GB of VRAM...

@justinh-rahb commented on GitHub (Jul 23, 2024): > With llama3.1 supporting 128k context this would be an important feature to have I think a lot of people are going to soon realize that Ollama doesn't actually run models at their full context size by default, and that if you wanted to run even 8B at 128K you'd need over 100GB of VRAM...

GiteaMirror commented

@rvkwi commented on GitHub (Jul 24, 2024):

With llama3.1 supporting 128k context this would be an important feature to have

I think a lot of people are going to soon realize that Ollama doesn't actually run models at their full context size by default, and that if you wanted to run even 8B at 128K you'd need over 100GB of VRAM...

just because some can't fit it, doesn't mean others can't. num_ctx is not exactly hidden or advanced to understand, and you don't have to max it out just because it can. on top not everything uses standard quadratic attention to sequence length.

open webui even has it more clearly worded as context length in the model file.
i'm just saying, RAG is great, but there are also plenty of uses where you specifically don't want to cut a context insertion apart. a lot of these cases don't even need a massive context, but it's not unreasonable that people run high contexts with the webui. i think it would be a helpful feature to have as a toggle.

@rvkwi commented on GitHub (Jul 24, 2024): > > With llama3.1 supporting 128k context this would be an important feature to have > > I think a lot of people are going to soon realize that Ollama doesn't actually run models at their full context size by default, and that if you wanted to run even 8B at 128K you'd need over 100GB of VRAM... just because some can't fit it, doesn't mean others can't. num_ctx is not exactly hidden or advanced to understand, and you don't have to max it out just because it can. on top not everything uses standard quadratic attention to sequence length. open webui even has it more clearly worded as context length in the model file. i'm just saying, RAG is great, but there are also plenty of uses where you specifically don't want to cut a context insertion apart. a lot of these cases don't even need a massive context, but it's not unreasonable that people run high contexts with the webui. i think it would be a helpful feature to have as a toggle.

GiteaMirror commented

2025-11-11 14:40:59 -06:00

@justinh-rahb commented on GitHub (Jul 24, 2024):

I wasn't arguing against it @rvkwi, you can see that I did thumbs-up the OP. You and I may know how num_ctx works full-well, but there are a LOT of WebUI users that don't understand that.

@justinh-rahb commented on GitHub (Jul 24, 2024): I wasn't arguing against it @rvkwi, you can see that I did thumbs-up the OP. You and I may know how `num_ctx` works full-well, but there are a LOT of WebUI users that don't understand that.

GiteaMirror commented

2025-11-11 14:40:59 -06:00

@formigarafa commented on GitHub (Jul 30, 2024):

Is it possible to bypass the current RAG system? I mean, having the documents inserted in the chat but opt to not use/apply them through the built-in RAG system.

ATM, the biggest issue is that it is hard to integrate any alternative with a RAG on the way.

I did not check but yet if it is available, but with that in hand and with an api to retrieve/read the documents we could implement alternative methods of context insertion with filters, pipes, tools or whatever we dream.

@formigarafa commented on GitHub (Jul 30, 2024): Is it possible to bypass the current RAG system? I mean, having the documents inserted in the chat but opt to not use/apply them through the built-in RAG system. ATM, the biggest issue is that it is hard to integrate any alternative with a RAG on the way. I did not check but yet if it is available, but with that in hand and with an api to retrieve/read the documents we could implement alternative methods of context insertion with filters, pipes, tools or whatever we dream.

GiteaMirror commented

2025-11-11 14:41:00 -06:00

@justinh-rahb commented on GitHub (Jul 30, 2024):

Is it possible to bypass the current RAG system? I mean, having the documents inserted in the chat but opt to not use/apply them through the built-in RAG system.

Example for Tools:

class Tools:
    def __init__(self):
        # If set to true it will prevent default RAG pipeline
        self.file_handler = True

From: https://openwebui.com/t/hub/get_files

@justinh-rahb commented on GitHub (Jul 30, 2024): > Is it possible to bypass the current RAG system? I mean, having the documents inserted in the chat but opt to not use/apply them through the built-in RAG system. Example for Tools: ```python class Tools: def __init__(self): # If set to true it will prevent default RAG pipeline self.file_handler = True ``` From: https://openwebui.com/t/hub/get_files

GiteaMirror commented

2025-11-11 14:41:00 -06:00

@pomoke commented on GitHub (Aug 10, 2024):

If we cannot fit the whole file into the ctx, we will need to take some other approaches like memGPT.

@pomoke commented on GitHub (Aug 10, 2024): If we cannot fit the whole file into the ctx, we will need to take some other approaches like memGPT.

GiteaMirror commented

@formigarafa commented on GitHub (Aug 11, 2024):

If we cannot fit the whole file into the ctx, we will need to take some other approaches like memGPT.

ATM, what happens the given prompt (and chat history) does not fit the context? I mean, this problem already exists, doesn't it?

@formigarafa commented on GitHub (Aug 11, 2024): > If we cannot fit the whole file into the ctx, we will need to take some other approaches like memGPT. ATM, what happens the given prompt (and chat history) does not fit the context? I mean, this problem already exists, doesn't it?

GiteaMirror commented

@pomoke commented on GitHub (Aug 13, 2024):

Is it possible to bypass the current RAG system? I mean, having the documents inserted in the chat but opt to not use/apply them through the built-in RAG system.

Example for Tools:
class Tools:
    def __init__(self):
        # If set to true it will prevent default RAG pipeline
        self.file_handler = True
From: https://openwebui.com/t/hub/get_files

However, the default RAG is still seems to be enabled with this and .citation = True.

@pomoke commented on GitHub (Aug 13, 2024): > > Is it possible to bypass the current RAG system? I mean, having the documents inserted in the chat but opt to not use/apply them through the built-in RAG system. > > Example for Tools: > > ```python > class Tools: > def __init__(self): > # If set to true it will prevent default RAG pipeline > self.file_handler = True > ``` > > From: https://openwebui.com/t/hub/get_files However, the default RAG is still seems to be enabled with this and `.citation = True`.

GiteaMirror commented

@NeverOccurs commented on GitHub (Sep 10, 2024):

Try this:
https://github.com/NeverOccurs/full_doc_pipe
Use it as a filter function in Open-Webui. It allows uploading a file and push it fully into system prompt. Then you can direct ask the LLM to summarize it or whatever. It doesn't support changing file midway through the chat though.
Still optimizing it but for now I think it is enough for my use case. I'm trying to incorporating it with Claude's prompt caching to minimize cost.

@NeverOccurs commented on GitHub (Sep 10, 2024): Try this: https://github.com/NeverOccurs/full_doc_pipe Use it as a filter function in Open-Webui. It allows uploading a file and push it fully into system prompt. Then you can direct ask the LLM to summarize it or whatever. It doesn't support changing file midway through the chat though. Still optimizing it but for now I think it is enough for my use case. I'm trying to incorporating it with Claude's prompt caching to minimize cost.

GiteaMirror commented

@tjbck commented on GitHub (Sep 29, 2024):

Full document mode toggle option has been added to latest dev, you can find the toggleable option by clicking on the file item component. Testing wanted here!

@tjbck commented on GitHub (Sep 29, 2024): <img width="1047" alt="image" src="https://github.com/user-attachments/assets/75e205e7-c240-4cc1-9c1c-7449ab0818c5"> Full document mode toggle option has been added to latest dev, you can find the toggleable option by clicking on the file item component. Testing wanted here!

GiteaMirror commented

BK72XX_SDK_User_Manual-3.0.3-1.pdf

@daniel-dona commented on GitHub (Oct 13, 2024):

Looks like the context is not being passed to the LLM with the "full document" option.

The query received by Ollama:

ollama  | time=2024-10-13T14:43:54.332Z level=DEBUG source=routes.go:1422 msg="chat request" images=0 prompt="<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
You are given a user query, some textual context and rules, all inside xml tags. You have to answer the query based on the context while respecting the rules.

<context>
BK72XX_SDK_User_Manual-3.0.3-1.pdf:
</context>

<rules>
- If you don't know, just say so.
- If you are not sure, ask for clarification.
- Answer in the same language as the user query.
- If the context appears unreadable or of poor quality, tell the user then answer as best as you can.
- If the answer is not in the context but you think you know the answer, explain that to the user then answer with your own knowledge.
- Answer directly and without using xml tags.
</rules>

<user_query>
Summarize this
</user_query>

Summarize this<|im_end|>
<|im_start|>assistant

Used document:

@daniel-dona commented on GitHub (Oct 13, 2024): Looks like the context is not being passed to the LLM with the "full document" option. ![imagen](https://github.com/user-attachments/assets/ab34b8fc-c869-4049-8ad3-e746bb29de29) ![imagen](https://github.com/user-attachments/assets/bea8b28d-f1b9-42a7-846e-a23fdc1b400d) ![imagen](https://github.com/user-attachments/assets/6ce550d5-737e-4d8f-86f7-7905ff7fb7b1) The query received by Ollama: ``` ollama | time=2024-10-13T14:43:54.332Z level=DEBUG source=routes.go:1422 msg="chat request" images=0 prompt="<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user You are given a user query, some textual context and rules, all inside xml tags. You have to answer the query based on the context while respecting the rules. <context> BK72XX_SDK_User_Manual-3.0.3-1.pdf: </context> <rules> - If you don't know, just say so. - If you are not sure, ask for clarification. - Answer in the same language as the user query. - If the context appears unreadable or of poor quality, tell the user then answer as best as you can. - If the answer is not in the context but you think you know the answer, explain that to the user then answer with your own knowledge. - Answer directly and without using xml tags. </rules> <user_query> Summarize this </user_query> Summarize this<|im_end|> <|im_start|>assistant ``` Used document: [BK72XX_SDK_User_Manual-3.0.3-1.pdf](https://github.com/user-attachments/files/17356037/BK72XX_SDK_User_Manual-3.0.3-1.pdf)

GiteaMirror commented

@daniel-dona commented on GitHub (Oct 13, 2024):

@daniel-dona It sounds like you are also facing #291 (please check the logs and confirm).

Nope, not a single 40X error found in the Developer Console, I don't think is a CORS problem

@daniel-dona commented on GitHub (Oct 13, 2024): > @daniel-dona It sounds like you are also facing #291 (please check the logs and confirm). Nope, not a single 40X error found in the Developer Console, I don't think is a CORS problem

GiteaMirror commented

@a-mart commented on GitHub (Oct 21, 2024):

Seeing the same issue. Only handing 5 random pages of the document to the model out of roughly 50.

@a-mart commented on GitHub (Oct 21, 2024): Seeing the same issue. Only handing 5 random pages of the document to the model out of roughly 50.

GiteaMirror commented

@a-mart commented on GitHub (Oct 28, 2024):

I've also confirmed that I'm not seeing the errors from #291 but still experiencing the same issue

@a-mart commented on GitHub (Oct 28, 2024): I've also confirmed that I'm not seeing the errors from #291 but still experiencing the same issue

GiteaMirror commented