[GH-ISSUE #5736] bug: Open WebUI RAG Malfunction with Ollama Versions Post 0.2.1 #29331

Closed
opened 2026-04-22 08:05:53 -05:00 by GiteaMirror · 16 comments
Owner

Originally created by @silentoplayz on GitHub (Jul 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5736

What is the issue?

Summary:

Retrieval-Augmented Generation (RAG) functionality within Open WebUI breaks when using Ollama versions later than 0.2.1 for local models. While external models (e.g., GroqCloud's LLama 3 8B) function correctly with RAG, local models fail to utilize the selected document, returning irrelevant or fabricated information. This issue occurs with both SentenceTransformers and Ollama RAG embedding models.

Affected Versions:

  • Ollama: 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8
  • Open WebUI: Latest dev and main branches

Unaffected Versions:

  • Ollama: versions prior to 0.2.1
  • Open WebUI: ?

Steps to Reproduce:

  1. Clean Slate:
    • Downgrade Ollama to version 0.2.0 (ollama --version).
    • In Open WebUI, clear all documents from the Workspace > Documents tab.
    • Navigate to Admin Panel > Settings > Documents and click Reset Upload Directory and Reset Vector Storage.
  2. Successful RAG Test (Ollama 0.2.0 & 0.2.1):
    • Add a .txt document to the Open WebUI Documents workspace.
    • Start a new chat and select the document using the # key.
    • Input a query related to the document content.
    • Verify that both local and external LLMs respond accurately, incorporating information from the selected document.
    • Repeat steps 1 & 2 for Ollama version 0.2.1 after upgrading (ollama --version).
  3. Failing RAG Test (Ollama 0.2.2 onwards):
    • Upgrade Ollama to version 0.2.2 (ollama --version).
    • Start a new chat, select the same document from step 2 using the # key, and input the same query.
    • Observe that local LLMs fail to utilize the document content, providing irrelevant or fabricated responses.
    • Verify that external LLMs still function correctly with RAG.
    • Repeat step 3 for Ollama versions 0.2.3-0.3.0, observing the same behavior.

Expected Behavior:
Local LLMs should successfully utilize the selected document for RAG, providing accurate and relevant responses based on its content, regardless of the Ollama version used.

Actual Behavior:
Local LLMs fail to perform RAG accurately when using Ollama versions 0.2.2 and later, while external models remain unaffected. This occurs despite successful document loading and embedding generation (confirmed by testing with both SentenceTransformers and Ollama embedding models).

Additional Notes:

  • The issue persists across multiple attempts, regenerations, and message edits.
  • The problem is not specific to a particular document or query, as it consistently occurs with different types of documents.
  • Resetting the Open WebUI upload directory and vector storage, as well as re-uploading documents, does not resolve the issue.
  • The issue is not related to Tika document extraction for RAG within Open WebUI, as confirmed through testing.
  • Downgrading Ollama to version 0.2.0 completely resolves the RAG malfunction within Open WebUI.

Conclusion:
A regression appears to have been introduced in Ollama versions after 0.2.1, specifically impacting the interaction between Ollama and Open WebUI for local model RAG functionality. This issue necessitates investigation and resolution to ensure the proper functioning of RAG across all supported Ollama versions.

The maintainer of Open WebUI has also confirmed this bug on the latest version of Open WebUI in combination with the latest version of Ollama:
Screenshot 2024-07-16 185540
Screenshot 2024-07-16 185651

Latest Open WebUI RAG + Ollama v0.2.2 (failures 100% of the time with local models it seems):
image

OS

Windows, Docker

GPU

AMD RX 6800 XT

CPU

Intel i7-12700K

Ollama version

0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.3.0 (latest)

Originally created by @silentoplayz on GitHub (Jul 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5736 ### What is the issue? **Summary:** Retrieval-Augmented Generation (RAG) functionality within Open WebUI breaks when using Ollama versions later than 0.2.1 for local models. While external models (e.g., GroqCloud's LLama 3 8B) function correctly with RAG, local models fail to utilize the selected document, returning irrelevant or fabricated information. This issue occurs with both `SentenceTransformers` and `Ollama` RAG embedding models. **Affected Versions:** * Ollama: 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8 * Open WebUI: Latest `dev` and `main` branches **Unaffected Versions:** * Ollama: versions prior to 0.2.1 * Open WebUI: ? **Steps to Reproduce:** 1. **Clean Slate:** * Downgrade Ollama to version 0.2.0 (`ollama --version`). * In Open WebUI, clear all documents from the `Workspace` > `Documents` tab. * Navigate to `Admin Panel` > `Settings` > `Documents` and click `Reset Upload Directory` and `Reset Vector Storage`. 2. **Successful RAG Test (Ollama 0.2.0 & 0.2.1):** * Add a `.txt` document to the Open WebUI `Documents` workspace. * Start a new chat and select the document using the `#` key. * Input a query related to the document content. * Verify that both local and external LLMs respond accurately, incorporating information from the selected document. * Repeat steps 1 & 2 for Ollama version 0.2.1 after upgrading (`ollama --version`). 3. **Failing RAG Test (Ollama 0.2.2 onwards):** * Upgrade Ollama to version 0.2.2 (`ollama --version`). * Start a new chat, select the same document from step 2 using the `#` key, and input the same query. * Observe that local LLMs fail to utilize the document content, providing irrelevant or fabricated responses. * Verify that external LLMs still function correctly with RAG. * Repeat step 3 for Ollama versions 0.2.3-0.3.0, observing the same behavior. **Expected Behavior:** Local LLMs should successfully utilize the selected document for RAG, providing accurate and relevant responses based on its content, regardless of the Ollama version used. **Actual Behavior:** Local LLMs fail to perform RAG accurately when using Ollama versions 0.2.2 and later, while external models remain unaffected. This occurs despite successful document loading and embedding generation (confirmed by testing with both `SentenceTransformers` and `Ollama` embedding models). **Additional Notes:** * The issue persists across multiple attempts, regenerations, and message edits. * The problem is not specific to a particular document or query, as it consistently occurs with different types of documents. * Resetting the Open WebUI upload directory and vector storage, as well as re-uploading documents, does not resolve the issue. * The issue is not related to `Tika` document extraction for RAG within Open WebUI, as confirmed through testing. * Downgrading Ollama to version 0.2.0 completely resolves the RAG malfunction within Open WebUI. **Conclusion:** A regression appears to have been introduced in Ollama versions after 0.2.1, specifically impacting the interaction between Ollama and Open WebUI for local model RAG functionality. This issue necessitates investigation and resolution to ensure the proper functioning of RAG across all supported Ollama versions. ### Related issue on the Open WebUI repo: https://github.com/open-webui/open-webui/discussions/3907 The maintainer of Open WebUI has also confirmed this bug on the latest version of Open WebUI in combination with the latest version of Ollama: ![Screenshot 2024-07-16 185540](https://github.com/user-attachments/assets/206d0c37-237a-476d-87d4-1c44fa4ebe7b) ![Screenshot 2024-07-16 185651](https://github.com/user-attachments/assets/8874ef89-bfa9-42e7-abd3-fb76a093e279) Latest Open WebUI RAG + Ollama v0.2.2 (failures 100% of the time with local models it seems): ![image](https://github.com/user-attachments/assets/9b4872a9-2912-4820-bd6e-b4f567aecc00) ### OS Windows, Docker ### GPU AMD RX 6800 XT ### CPU Intel i7-12700K ### Ollama version 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.3.0 (latest)
GiteaMirror added the bug label 2026-04-22 08:05:53 -05:00
Author
Owner

@Qualzz commented on GitHub (Jul 17, 2024):

Confirming the issue on my side too with the last update.

Nvidia (4090)
AMD Ryzen 7950 X

<!-- gh-comment-id:2232110294 --> @Qualzz commented on GitHub (Jul 17, 2024): Confirming the issue on my side too with the last update. Nvidia (4090) AMD Ryzen 7950 X
Author
Owner

@EncodedBird commented on GitHub (Jul 17, 2024):

To help with variables, I also have the issue on arch linux with a non-docker install.

AMD RX 6900 XT
CPU AMD Ryzen 5950X

<!-- gh-comment-id:2232488712 --> @EncodedBird commented on GitHub (Jul 17, 2024): To help with variables, I also have the issue on arch linux with a non-docker install. AMD RX 6900 XT CPU AMD Ryzen 5950X
Author
Owner

@zouzeTG commented on GitHub (Jul 17, 2024):

Same issue for me !
I use GPU NVIDIA A100 and Intel CPU with Windows server 2022

<!-- gh-comment-id:2232624437 --> @zouzeTG commented on GitHub (Jul 17, 2024): Same issue for me ! I use GPU NVIDIA A100 and Intel CPU with Windows server 2022
Author
Owner

@zouzeTG commented on GitHub (Jul 17, 2024):

I'm going to do more test, because for me I have also encontered bug wtih the version 2.1. with docx and xls files.
But I'm not sure

<!-- gh-comment-id:2232627496 --> @zouzeTG commented on GitHub (Jul 17, 2024): I'm going to do more test, because for me I have also encontered bug wtih the version 2.1. with docx and xls files. But I'm not sure
Author
Owner

@silentoplayz commented on GitHub (Jul 18, 2024):

Update/Bump:

After thorough testing, it has been determined that setting the Top K value within Open WebUI's Documents settings to a value of 1 resolves compatibility issues with RAG when using Ollama versions 0.2.1 and later.

Additionally, configuring the context length for your RAG model to a higher number, such as 8192, has been found to maintain functionality with these specific Ollama versions.

These observations are based on empirical data collected by the maintainer of Open WebUI and myself following rigorous testing. The provided screenshots serve as visual evidence supporting these findings:

Setting to adjust (Change this value to 1): openweb1

Open WebUI maintainer's findings: openweb

RAG working with context length set to 8192 (tested on Ollama v0.2.5 with latest dev commit of Open WebUI):
ragworkedhere

RAG working with Top K set to 1 within Open WebUI's Documents settings (tested on Ollama v0.2.5 with latest dev commit of Open WebUI):
ragworkedheretoo

We believe the information presented is crucial for addressing the bug report at hand and ensures that future installations and updates of Ollama maintains optimal compatibility with Open WebUI's RAG functionalities.

<!-- gh-comment-id:2236762972 --> @silentoplayz commented on GitHub (Jul 18, 2024): **Update/Bump**: After thorough testing, it has been determined that setting the `Top K` value within Open WebUI's `Documents` settings to a value of `1` resolves compatibility issues with RAG when using Ollama versions 0.2.1 and later. Additionally, configuring the context length for your RAG model to a higher number, such as `8192`, has been found to maintain functionality with these specific Ollama versions. These observations are based on empirical data collected by the maintainer of Open WebUI and myself following rigorous testing. The provided screenshots serve as visual evidence supporting these findings: Setting to adjust (Change this value to `1`): ![openweb1](https://github.com/user-attachments/assets/c299f84f-000a-4aaa-a06c-d9a33971dca5) Open WebUI maintainer's findings: ![openweb](https://github.com/user-attachments/assets/f6fa1891-c7e9-4ee7-84f8-aa894b74aa7b) RAG working with context length set to `8192` (**tested on Ollama v0.2.5 with latest dev commit of Open WebUI**): ![ragworkedhere](https://github.com/user-attachments/assets/8391422d-0c9c-4777-a3cf-81629845edf3) RAG working with `Top K` set to `1` within Open WebUI's `Documents` settings (**tested on Ollama v0.2.5 with latest dev commit of Open WebUI**): ![ragworkedheretoo](https://github.com/user-attachments/assets/3ab440b9-a802-4b13-9752-8a646be931db) We believe the information presented is crucial for addressing the bug report at hand and ensures that future installations and updates of Ollama maintains optimal compatibility with Open WebUI's RAG functionalities.
Author
Owner

@Qualzz commented on GitHub (Jul 18, 2024):

On my end, I have a very simple prompt, total token count of the document is around 2k token, slighty more than the 2048 default. However, even if the model is set to 4096, or if in the "current chat settings" it's set to 4096, the issue is still there. Using 8112 works. But 4096 should be way more than enough for my query.

@silentoplayz it seems that using the context length of the "current chat settings" on the sidebar, doesn't have any effect. I can set it to 1, 5, 654654654165, the model will answer the ragquery only depending on the settings I've set in the model settings ( workspace>model>edit model>advanced settings.

<!-- gh-comment-id:2236904714 --> @Qualzz commented on GitHub (Jul 18, 2024): On my end, I have a very simple prompt, total token count of the document is around 2k token, slighty more than the 2048 default. However, even if the model is set to 4096, or if in the "current chat settings" it's set to 4096, the issue is still there. Using 8112 works. But 4096 should be way more than enough for my query. @silentoplayz it seems that using the context length of the "current chat settings" on the sidebar, doesn't have any effect. I can set it to 1, 5, 654654654165, the model will answer the ragquery only depending on the settings I've set in the model settings ( workspace>model>edit model>advanced settings.
Author
Owner

@silentoplayz commented on GitHub (Jul 18, 2024):

On my end, I have a very simple prompt, total token count of the document is around 2k token, slighty more than the 2048 default. However, even if the model is set to 4096, or if in the "current chat settings" it's set to 4096, the issue is still there. Using 8112 works. But 4096 should be way more than enough for my query.

@silentoplayz it seems that using the context length of the "current chat settings" on the sidebar, doesn't have any effect. I can set it to 1, 5, 654654654165, the model will answer the rag query only depending on the settings I've set in the model settings ( workspace>model>edit model>advanced settings.

It is odd that adjusting the Context Length value within the Chat Controls on the right-hand sidebar doesn't have any effect for you. I have noticed it works to "fix" RAG for me within Open WebUI, even just by raising the context length to 4096 in some cases.

I also came to report that the same set of issues I've reported here appear to be present in recently released versions of Ollama: v0.2.6 and v0.2.7.

<!-- gh-comment-id:2237164266 --> @silentoplayz commented on GitHub (Jul 18, 2024): > On my end, I have a very simple prompt, total token count of the document is around 2k token, slighty more than the 2048 default. However, even if the model is set to 4096, or if in the "current chat settings" it's set to 4096, the issue is still there. Using 8112 works. But 4096 should be way more than enough for my query. > > @silentoplayz it seems that using the context length of the "current chat settings" on the sidebar, doesn't have any effect. I can set it to 1, 5, 654654654165, the model will answer the rag query only depending on the settings I've set in the model settings ( workspace>model>edit model>advanced settings. It is odd that adjusting the `Context Length` value within the Chat Controls on the right-hand sidebar doesn't have any effect for you. I have noticed it works to "fix" RAG for me within Open WebUI, even just by raising the context length to 4096 in some cases. I also came to report that the same set of issues I've reported here appear to be present in recently released versions of Ollama: v0.2.6 and v0.2.7.
Author
Owner

@Qualzz commented on GitHub (Jul 23, 2024):

any update ?

<!-- gh-comment-id:2244121000 --> @Qualzz commented on GitHub (Jul 23, 2024): any update ?
Author
Owner

@zouzeTG commented on GitHub (Jul 23, 2024):

Hi, any news with this problem ?

<!-- gh-comment-id:2245855523 --> @zouzeTG commented on GitHub (Jul 23, 2024): Hi, any news with this problem ?
Author
Owner

@ToeiRei commented on GitHub (Jul 25, 2024):

Looks like 0.3.0 still has that problem

<!-- gh-comment-id:2249914197 --> @ToeiRei commented on GitHub (Jul 25, 2024): Looks like 0.3.0 still has that problem
Author
Owner

@Qualzz commented on GitHub (Jul 25, 2024):

is this a openweb UI issue then ?

<!-- gh-comment-id:2250103103 --> @Qualzz commented on GitHub (Jul 25, 2024): is this a openweb UI issue then ?
Author
Owner

@silentoplayz commented on GitHub (Jul 25, 2024):

is this a openweb UI issue then ?

I honestly couldn't tell you. The maintainer of Open WebUI has been busy IRL lately and won't be available to check things out until at least the 1st of August. I've not heard anything from the Ollama team about this bug report.

But if I had to choose, it does appear to be an Ollama issue, as RAG works perfectly fine within the latest version of Open WebUI when paired with Ollama v0.2.1 or Ollama v0.2.0 and any versions before these two versions.

<!-- gh-comment-id:2251417896 --> @silentoplayz commented on GitHub (Jul 25, 2024): > is this a openweb UI issue then ? I honestly couldn't tell you. The maintainer of Open WebUI has been busy IRL lately and won't be available to check things out until at least the 1st of August. I've not heard anything from the Ollama team about this bug report. But if I had to choose, it does appear to be an Ollama issue, as RAG works perfectly fine within the latest version of Open WebUI when paired with Ollama v0.2.1 or Ollama v0.2.0 and any versions before these two versions.
Author
Owner

@ToeiRei commented on GitHub (Jul 26, 2024):

I have tried it on multiple systems now: The jetson container (arm64) wasn't affected. Looks like it's x86_64 that's acting up.

<!-- gh-comment-id:2252902157 --> @ToeiRei commented on GitHub (Jul 26, 2024): I have tried it on multiple systems now: The jetson container (arm64) wasn't affected. Looks like it's x86_64 that's acting up.
Author
Owner

@silentoplayz commented on GitHub (Jul 27, 2024):

Upon conducting additional testing on the dev branch of Open WebUI and the latest version of Ollama today, I've come to the conclusion that I need to adjust the context length of models within the recently added Chat Controls right-hand sidebar within Open WebUI or within the modelfile for a model in the Models section of the Workspace tab for RAG to tell me about documents. This is what I've come to find. It's still a bug to me, as RAG functioned effectively previously, without any need for alterations; and I've talked with Open WebUI contributors enough to find out that this is not normal behavior.

The solution involves setting the context length in the model file to an appropriate value such as 4096 or 8192 within Open WebUI. RAG should just work optimally again (hopefully), without further complications.

C:\Users\G30>ollama --version
ollama version is 0.3.0
<!-- gh-comment-id:2253743876 --> @silentoplayz commented on GitHub (Jul 27, 2024): Upon conducting additional testing on the dev branch of Open WebUI and the latest version of Ollama today, I've come to the conclusion that I ***need*** to adjust the context length of models within the recently added `Chat Controls` right-hand sidebar within Open WebUI or within the modelfile for a model in the `Models` section of the `Workspace` tab for RAG to tell me about documents. This is what I've come to find. It's still a bug to me, as RAG functioned effectively previously, without any need for alterations; and I've talked with Open WebUI contributors enough to find out that this is not normal behavior. The solution involves setting the context length in the model file to an appropriate value such as `4096` or `8192` within Open WebUI. RAG should just work optimally again (hopefully), without further complications. ``` C:\Users\G30>ollama --version ollama version is 0.3.0 ```
Author
Owner

@silentoplayz commented on GitHub (Jul 27, 2024):

Update: A workaround fix has been applied in the latest dev branch of Open WebUI, resurrecting RAG functionality back to the way it worked before; and that is for it to essentially work out of the box. It's a bit sneaky/hacky of a fix, but it DOES work!

So what changed? Basically, instead of using the system prompt to provide context to the LLM, Open WebUI now injects the context to the user prompt. We specifically rely on system prompts to provide context, so this fix isn't an ideal fix, but it's been applied as a workaround fix for the time being.

Commit: 1aaa2e8219

I will be closing this issue, lest someone else comes along to report that they have issues beyond this workaround fix.

<!-- gh-comment-id:2254278712 --> @silentoplayz commented on GitHub (Jul 27, 2024): **Update**: A workaround fix has been applied in the latest dev branch of Open WebUI, resurrecting RAG functionality back to the way it worked before; and that is for it to essentially work out of the box. It's a bit sneaky/hacky of a fix, but it DOES work! So what changed? Basically, instead of using the system prompt to provide context to the LLM, Open WebUI now injects the context to the user prompt. We specifically rely on system prompts to provide context, so this fix isn't an ideal fix, but it's been applied as a workaround fix for the time being. Commit: https://github.com/open-webui/open-webui/commit/1aaa2e8219b5213725e137c65424f6cacab89b6b I will be closing this issue, lest someone else comes along to report that they have issues beyond this workaround fix.
Author
Owner

@windowshopr commented on GitHub (Dec 8, 2024):

I will say that I was having issues with this too, ollama LLMs not reading the provided documents at all and generating bad answers. I tried providing the context as a user prompt instead of a system prompt, but the issue persisted so I don't think it's a prompt issue, BUT changing the context length in the advanced options helped. So looking into the max context length of the LLM you're using and setting that option to a multiple of that helps. Not sure what the "magic" amount is, but that helped me.

<!-- gh-comment-id:2525367144 --> @windowshopr commented on GitHub (Dec 8, 2024): I will say that I was having issues with this too, ollama LLMs not reading the provided documents at all and generating bad answers. I tried providing the context as a user prompt instead of a system prompt, but the issue persisted so I don't think it's a prompt issue, BUT changing the context length in the advanced options helped. So looking into the max context length of the LLM you're using and setting that option to a multiple of that helps. Not sure what the "magic" amount is, but that helped me.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29331