RAG is only used on the first chat message #1450

New Issue

GiteaMirror · 2025-11-11T14:45:27-06:00

GiteaMirror commented

2025-11-11 14:45:27 -06:00

Originally created by @aleixdorca on GitHub (Jul 6, 2024).

Bug Report

Description

Bug Summary:
Open-Web UI only uses the "RAG" (Retrieval Augmented Generation) technique on the first message of the conversation. From the second message onwards, the response does not seem to be based on the previous conversation context.

Steps to Reproduce:
Start a new chat, upload a file and ask a question. Docker and Ollama log the normal RAG behaviour (with the proper RAG prompt). From the second message RAG is not used.

Expected Behavior:
RAG should be used on all questions, isn't it?

Environment

Open WebUI Version: 0.3.7
Ollama (if applicable): 0.1.48
Operating System: debian+docker
Browser (if applicable): Chrome 126.0.6478.127

Reproduction Details

Confirmation:

I have read and followed all the instructions provided in the README.md.
I am on the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.

Logs and Screenshots

Docker Container Logs:

The docker logs show when the document is uploaded and embedded. For the first question RAG is shown in the Docker Logs as:

Use the following context as your learned knowledge, inside <context></context> XML tags.
<context>
REDACTED (but it is ok)
</context>

When answer to user:
- If you don't know, just say that you don't know.
- If you don't know when you are not sure, ask for clarification.
Avoid mentioning that you obtained the information from the context.
And answer according to the language of the user's question.

Given the context information, answer the query.
Query: My question

Ollama logs the query as well. But, on the second message, only ollama logs a basic message, no RAG is used at all.

Installation Method

The project was installed using Docker

Originally created by @aleixdorca on GitHub (Jul 6, 2024). # Bug Report ## Description **Bug Summary:** Open-Web UI only uses the "RAG" (Retrieval Augmented Generation) technique on the first message of the conversation. From the second message onwards, the response does not seem to be based on the previous conversation context. **Steps to Reproduce:** Start a new chat, upload a file and ask a question. Docker and Ollama log the normal RAG behaviour (with the proper RAG prompt). From the second message RAG is not used. **Expected Behavior:** RAG should be used on all questions, isn't it? ## Environment - **Open WebUI Version:** 0.3.7 - **Ollama (if applicable):** 0.1.48 - **Operating System:** debian+docker - **Browser (if applicable):** Chrome 126.0.6478.127 ## Reproduction Details **Confirmation:** - [X] I have read and followed all the instructions provided in the README.md. - [X] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [X] I have included the Docker container logs. ## Logs and Screenshots **Docker Container Logs:** The docker logs show when the document is uploaded and embedded. For the first question RAG is shown in the Docker Logs as: ``` Use the following context as your learned knowledge, inside <context></context> XML tags. <context> REDACTED (but it is ok) </context> When answer to user: - If you don't know, just say that you don't know. - If you don't know when you are not sure, ask for clarification. Avoid mentioning that you obtained the information from the context. And answer according to the language of the user's question. Given the context information, answer the query. Query: My question ``` Ollama logs the query as well. But, on the second message, only ollama logs a basic message, no RAG is used at all. ## Installation Method The project was installed using Docker

GiteaMirror closed this issue

2025-11-11 14:45:27 -06:00

GiteaMirror commented

2025-11-11 14:45:27 -06:00

@aleixdorca commented on GitHub (Jul 6, 2024):

I have also tried with different models (mistral, llama3, gemma2), just in case. Same behaviour.

@aleixdorca commented on GitHub (Jul 6, 2024): I have also tried with different models (mistral, llama3, gemma2), just in case. Same behaviour.

GiteaMirror commented

2025-11-11 14:45:27 -06:00

@silentoplayz commented on GitHub (Jul 6, 2024):

This is not a bug, but rather a deliberate change in how RAG handles uploaded documents. Since the introduction of the Knowledge feature, the default behavior has been updated. Now, uploaded documents are only considered within the context of a single message. To restore the previous functionality, you can enable uploaded documents or collections of documents as Knowledge for a model file in the Models section of the Workspace. This allows the model file to retain knowledge of the documents from the initial message onwards, eliminating the need to manually add documents to each query during a chat session with the model.

@silentoplayz commented on GitHub (Jul 6, 2024): This is not a bug, but rather a deliberate change in how RAG handles uploaded documents. Since the introduction of the `Knowledge` feature, the default behavior has been updated. Now, uploaded documents are only considered within the context of a single message. To restore the previous functionality, you can enable uploaded documents or collections of documents as `Knowledge` for a model file in the `Models` section of the `Workspace`. This allows the model file to retain knowledge of the documents from the initial message onwards, eliminating the need to manually add documents to each query during a chat session with the model.

GiteaMirror commented

2025-11-11 14:45:28 -06:00

@aleixdorca commented on GitHub (Jul 6, 2024):

Thanks for answering and closing the bug report.

I don't get it, though. The way you put it means (please correct me if I am wrong):

Only administrators can upload documents to the Knowledge model section.
Regular users have no access to the Workspace so this is somewhat limited to some specific users.
Whenever a users wants to chat with a document or webpage (same thing happens with # elements) they have to keep reuploading the elements (even if these are not reembedded, this I understand). This is unbearingly hard.
If, for whatever reason, they forget to reupload the document or webpage, the answer is completely hallucinated.

This breaks a major feature of Open WebUI, in my opinion.

To add to this, the setup we are testing at our university gives access to 50 users, none with admin rights. Admins should add the company's information, this I get, but for casual documents, users should have more control and access to the RAG feature.

@aleixdorca commented on GitHub (Jul 6, 2024): Thanks for answering and closing the bug report. I don't get it, though. The way you put it means (please correct me if I am wrong): - Only administrators can upload documents to the Knowledge model section. - Regular users have no access to the Workspace so this is somewhat limited to some specific users. - Whenever a users wants to chat with a document or webpage (same thing happens with # elements) they have to keep reuploading the elements (even if these are not reembedded, this I understand). This is unbearingly hard. - If, for whatever reason, they forget to reupload the document or webpage, the answer is completely hallucinated. This breaks a major feature of Open WebUI, in my opinion. To add to this, the setup we are testing at our university gives access to 50 users, none with admin rights. Admins *should* add the company's information, this I get, but for casual documents, users should have more control and access to the RAG feature.

GiteaMirror commented

2025-11-11 14:45:28 -06:00

@silentoplayz commented on GitHub (Jul 6, 2024):

I understand your concerns and appreciate you breaking down the limitations of the current implementation of RAG within Open WebUI.

You are correct that:

Only administrators can upload documents to the Knowledge model section. This isn't inherently a new issue, as both the Documents and Models sections has always been limited to only administrative configuration.
Regular users don't have access to the Workspace, which means they can't manage documents or the recent addition of model file knowledge, which may seem even more restrictive.

In addition to these existing limitations, the recent change to RAG's handling of uploaded documents has introduced new challenges. You're right that:

Users need to reupload documents or link a URL for each chat session, which can be inconvenient.
Forgetting to reupload the document or link a URL can lead to hallucinated answers with a weaker model, which can be frustrating for users and undermine the trust they have in the RAG system.

Speaking for many, I acknowledge that this change may have taken a hit to a major feature of Open WebUI in the perspective of some users, and we should revisit the design to make it more user-friendly and accessible.

With this all having been said, I will mention that the Open WebUI team is aware of the need for a more flexible solution that allows users to manage their own documents without relying on administrators. This is an area that is actively being worked on to be improved in the future, and we're excited to introduce "teams" in an upcoming feature. Related - https://github.com/open-webui/open-webui/issues/2924

@silentoplayz commented on GitHub (Jul 6, 2024): I understand your concerns and appreciate you breaking down the limitations of the current implementation of RAG within Open WebUI. You are correct that: * Only administrators can upload documents to the Knowledge model section. This isn't inherently a new issue, as both the `Documents` and `Models` sections has always been limited to only administrative configuration. * Regular users don't have access to the `Workspace`, which means they can't manage documents *or* the recent addition of model file knowledge, which may seem even more restrictive. In addition to these existing limitations, the recent change to RAG's handling of uploaded documents has introduced new challenges. You're right that: * Users need to reupload documents or link a URL for each chat session, which can be inconvenient. * Forgetting to reupload the document or link a URL can lead to hallucinated answers with a weaker model, which can be frustrating for users and undermine the trust they have in the RAG system. Speaking for many, I acknowledge that this change may have taken a hit to a major feature of Open WebUI in the perspective of some users, and we should revisit the design to make it more user-friendly and accessible. With this all having been said, I will mention that the Open WebUI team is aware of the need for a more flexible solution that allows users to manage their own documents without relying on administrators. This is an area that is actively being worked on to be improved in the future, and we're excited to introduce "teams" in an upcoming feature. Related - https://github.com/open-webui/open-webui/issues/2924

GiteaMirror commented

2025-11-11 14:45:28 -06:00

@aleixdorca commented on GitHub (Jul 6, 2024):

It's great to hear that you understand the concerns regarding the recent changes to RAG in Open WebUI.

You've accurately outlined the issues, including the limitations for regular users, the inconvenience of reuploading documents per session, and the potential for inaccurate responses due to missing document links.

It's reassuring to know that the Open WebUI team is aware of these challenges and is actively working on a solution. The introduction of "teams" in an upcoming feature seems promising and could address many of the current limitations.

I appreciate your constructive feedback and your willingness to engage in this discussion.

I will keep an eye on future updates.

@aleixdorca commented on GitHub (Jul 6, 2024): It's great to hear that you understand the concerns regarding the recent changes to RAG in Open WebUI. You've accurately outlined the issues, including the limitations for regular users, the inconvenience of reuploading documents per session, and the potential for inaccurate responses due to missing document links. It's reassuring to know that the Open WebUI team is aware of these challenges and is actively working on a solution. The introduction of "teams" in an upcoming feature seems promising and could address many of the current limitations. I appreciate your constructive feedback and your willingness to engage in this discussion. I will keep an eye on future updates.

GiteaMirror commented

2025-11-11 14:45:28 -06:00

@Qualzz commented on GitHub (Jul 11, 2024):

It's very difficult to have a chat over a document, as the LLM doesn't create it's own query.
Thus you need to write every keyword in every message.

Exemple:
User: Can you retrieve the frame data for whatever here:
AI: Here is the data
User: Can you also display the images as markdown ?
AI: I don't fucking know what you're talking about -> Because the query will be "Can you also display the images as markdown ?"

@Qualzz commented on GitHub (Jul 11, 2024): It's very difficult to have a chat over a document, as the LLM doesn't create it's own query. Thus you need to write every keyword in every message. Exemple: User: Can you retrieve the frame data for whatever here: AI: Here is the data User: Can you also display the images as markdown ? AI: I don't fucking know what you're talking about -> Because the query will be "`Can you also display the images as markdown ?`"

GiteaMirror commented

2025-11-11 14:45:29 -06:00

@silentoplayz commented on GitHub (Jul 11, 2024):

It's very difficult to have a chat over a document, as the LLM doesn't create it's own query. Thus you need to write every keyword in every message.

Related: https://github.com/open-webui/open-webui/discussions/3516#discussioncomment-10016923

@silentoplayz commented on GitHub (Jul 11, 2024): > It's very difficult to have a chat over a document, as the LLM doesn't create it's own query. Thus you need to write every keyword in every message. Related: https://github.com/open-webui/open-webui/discussions/3516#discussioncomment-10016923

GiteaMirror commented

2025-11-11 14:45:29 -06:00

@flyfox666 commented on GitHub (Jul 13, 2024):

Finally found this issue to solve my doubts haha.I'm waiting for the latest version to be updated.In fact, I still hope that ordinary users can have their own workspace, while the administrator can have a supervision, so that the company's internal BU department to deploy more quickly and easily!

Really Appreciated

@flyfox666 commented on GitHub (Jul 13, 2024): Finally found this issue to solve my doubts haha.I'm waiting for the latest version to be updated.In fact, I still hope that ordinary users can have their own workspace, while the administrator can have a supervision, so that the company's internal BU department to deploy more quickly and easily! Really Appreciated

GiteaMirror referenced this issue

2026-04-19 18:57:52 -05:00

[GH-ISSUE #609] feat: monitoring dashboard #12141

GiteaMirror referenced this issue

2026-04-19 19:26:09 -05:00

[GH-ISSUE #1450] feat: admin dashboard #12501

GiteaMirror referenced this issue

2026-04-19 19:27:35 -05:00

[GH-ISSUE #1526] Is it possible to add a list or a chart in the admin backend to display the amount of requests for each model on the platform? #12537

GiteaMirror referenced this issue

2026-04-25 02:24:51 -05:00

[GH-ISSUE #609] feat: monitoring dashboard #27669