[GH-ISSUE #19905] issue: Knowlege provided as "entire document" / "bypass embedding" is always injected at the END of chat flow, breaking KV prefix caching #57700

Closed
opened 2026-05-05 21:26:21 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @adhusch on GitHub (Dec 12, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19905

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.41

Ollama Version (if applicable)

N/A

Operating System

N/A

Browser (if applicable)

N/A

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When using "bypass embedding and retrival" and/or "using entire docment" mode for knowledge added to a model, this knowledge should be included in the first message of the (JSON) chat flow and remain at the first position. By that prefix KV caching can work as expected, the potentially large knowledge part remains cached, and follow up questions need token generation only for the small follow up question part.

Actual Behavior

Since some time, the knowlege in "using entire document" mode is addeds as the LAST element of the chat JSON, follow up question are injected BEFORE. By that any resonable KV caching is broken as the chat prefix changes with each follow up and chat response is slow as the whole knowlege has to be reprocessed. See minimal example below.

See also llama.cpp server issue https://github.com/ggml-org/llama.cpp/issues/17673

Potentially related openwebui issue #19099

Steps to Reproduce

I configured openwebui to "bypass embedding and retrival" and added a test knowledge base that only contains one document with the text of the poem "The Raven", configured this KB as "Use Entire Document" and intercepted the JSON requests using LiteLLM. Result as described, we see that the knowledge is always at the END of the JSON while the follow up requests are injected BEFORE thus the prefix is unstable and the common prefix drops towards zero for a long context KB (prefix only matching the system/user prompt with can be very short compared to the KB text potenitally consisting of several 10k of token):

Initial request:

{
"model": "",
"stream": true,
"messages": [
{
"role": "system",
"content": "You are an assistant helping to interpret poems."
},
{
"role": "user",
"content": "# ### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., <source id="1">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the ta... (litellm_truncated skipped 6367 chars) ...e!"
Quoth the Raven, "Nevermore."

"Prophet!" said I, "thing of evil- prophet still, if bird or
devil!
By that Heaven that bends above us- by that God we both adore-
Tell this soul with sorrow laden if, within the distant Aidenn,
It shall clasp a sainted maiden whom the angels name Lenore-
Clasp a rare and radiant maiden whom the angels name Lenore."
Quoth the Raven, "Nevermore."

"Be that word our sign in parting, bird or fiend," I shrieked,
upstarting-
"Get thee back into the tempest and the Night's Plutonian shore!
Leave no black plume as a token of that lie thy soul hath spoken!
Leave my loneliness unbroken!- quit the bust above my door!
Take thy beak from out my heart, and take thy form from off my
door!"
Quoth the Raven, "Nevermore."

And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamplight o'er him streaming throws his shadow on the
floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted- nevermore!\n\n\n<user_query>\nWhats the overall message of the poem?\n</user_query>\n\nWhats the overall message of the poem?"
}
],
"stream_options": {
"include_usage": true
}
}

Follow up request:

{
"model": "",
"stream": true,
"messages": [
{
"role": "system",
"content": "You are an assistant helping to interpret poems."
},
{
"role": "user",
"content": "Whats the overall message of the poem?"
},
{
"role": "assistant",
"content": "The overall message of the poem "The Raven" by Edgar Allan Poe is one of profound grief, loss, and the inability to escape the haunting memories of a loved one. The narrator, overwhelmed by sorrow for his lost love Lenore, is visited by a raven that repeatedly utters the word "Nevermore." This interaction underscores the narrator's despair and the inescapable nature of his sorrow. The raven serves as a symbol of the narrator's persistent and unyielding grief, emphasizing that there is no comfort or respite from his pain [1]."
},
{
"role": "user",
"content": "# ### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., <source id="1">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the ta... (litellm_truncated skipped 6415 chars) ...Prophet!" said I, "thing of evil- prophet still, if bird or
devil!
By that Heaven that bends above us- by that God we both adore-
Tell this soul with sorrow laden if, within the distant Aidenn,
It shall clasp a sainted maiden whom the angels name Lenore-
Clasp a rare and radiant maiden whom the angels name Lenore."
Quoth the Raven, "Nevermore."

"Be that word our sign in parting, bird or fiend," I shrieked,
upstarting-
"Get thee back into the tempest and the Night's Plutonian shore!
Leave no black plume as a token of that lie thy soul hath spoken!
Leave my loneliness unbroken!- quit the bust above my door!
Take thy beak from out my heart, and take thy form from off my
door!"
Quoth the Raven, "Nevermore."

And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamplight o'er him streaming throws his shadow on the
floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted- nevermore!\n\n\n<user_query>\nWhat are some of the literary devices Poe uses in 'The Raven'?\n</user_query>\n\nWhat are some of the literary devices Poe uses in 'The Raven'?"
}
],
"stream_options": {
"include_usage": true
}
}

2nd Follow-up Request:

{
"model": "",
"stream": true,
"messages": [
{
"role": "system",
"content": "You are an assistant helping to interpret poems."
},
{
"role": "user",
"content": "Whats the overall message of the poem?"
},
{
"role": "assistant",
"content": "The overall message of the poem "The Raven" by Edgar Allan Poe is one of profound grief, loss, and the inability to escape the haunting memories of a loved one. The narrator, overwhelmed by sorrow for his lost love Lenore, is visited by a raven that repeatedly utters the word "Nevermore." This interaction underscores the narrator's despair and the inescapable nature of his sorrow. The raven serves as a symbol of the narrator's persistent and unyielding grief, emphasizing that there is no comfort or respite from his pain [1]."
},
{
"role": "user",
"content": "What are some of the literary devices Poe uses in 'The Raven'?"
},
{
"role": "assistant",
"content": "Edgar Allan Poe employs a variety of literary devices in "The Raven" to create its haunting and melancholic atmosphere. Some of the key literary devices include:\n\n1. Alliteration: Poe uses alliteration to create rhythm and musicality in the poem. For example, "Deep into that darkness peering, long I stood there wondering, fearing" [1].\n\n2. Repetition: The repetition of the word "Nevermore" by the raven creates a sense of inevitability and despair. This repetition also reinforces the theme of loss and the inability to escape grief [1].\n\n3. Symbolism: The raven itself is a powerful symbol of grief, loss, and the inescapable nature of sorrow. The bird's repeated utterance of "Nevermore" symbolizes the narrator's acceptance of his eternal grief [1].\n\n4. Imagery: Poe uses vivid and evocative imagery to create a sense of atmosphere and mood. For example, "Ah, distinctly I remember it was in the bleak December" sets a somber and bleak tone [1].\n\n5. Personification: The raven is personified as a visitor and a prophet, which adds to the eerie and supernatural atmosphere of the poem [1].\n\n6. Internal Rhyme: Poe uses internal rhyme to create a musical and rhythmic effect. For example, "Once upon a midnight dreary, while I pondered, weak and weary" [1].\n\n7. Refrain: The repeated use of the phrase "Nevermore" serves as a refrain, reinforcing the poem's themes and creating a sense of inevitability [1].\n\nThese literary devices work together to create the poem's haunting and melancholic atmosphere, emphasizing the narrator's profound grief and the inescapable nature of his sorrow."
},
{
"role": "user",
"content": "# ### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., <source id="1">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the ta... (litellm_truncated skipped 6433 chars) ..."thing of evil- prophet still, if bird or
devil!
By that Heaven that bends above us- by that God we both adore-
Tell this soul with sorrow laden if, within the distant Aidenn,
It shall clasp a sainted maiden whom the angels name Lenore-
Clasp a rare and radiant maiden whom the angels name Lenore."
Quoth the Raven, "Nevermore."

"Be that word our sign in parting, bird or fiend," I shrieked,
upstarting-
"Get thee back into the tempest and the Night's Plutonian shore!
Leave no black plume as a token of that lie thy soul hath spoken!
Leave my loneliness unbroken!- quit the bust above my door!
Take thy beak from out my heart, and take thy form from off my
door!"
Quoth the Raven, "Nevermore."

And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamplight o'er him streaming throws his shadow on the
floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted- nevermore!\n\n\n<user_query>\nCan you explain the significance of the raven's appearance in the poem?\n</user_query>\n\nCan you explain the significance of the raven's appearance in the poem?"
}
],
"stream_options": {
"include_usage": true
}
}

Logs & Screenshots

See steps to reproduce

Additional Information

Inject knwoledge always at the beginning, immediatly after the system prompt and keep it there at all costs.

Originally created by @adhusch on GitHub (Dec 12, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/19905 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.41 ### Ollama Version (if applicable) N/A ### Operating System N/A ### Browser (if applicable) N/A ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When using "bypass embedding and retrival" and/or "using entire docment" mode for knowledge added to a model, this knowledge should be included in the first message of the (JSON) chat flow and remain at the first position. By that prefix KV caching can work as expected, the potentially large knowledge part remains cached, and follow up questions need token generation only for the small follow up question part. ### Actual Behavior Since some time, the knowlege in "using entire document" mode is addeds as the LAST element of the chat JSON, follow up question are injected BEFORE. By that any resonable KV caching is broken as the chat prefix changes with each follow up and chat response is slow as the whole knowlege has to be reprocessed. See minimal example below. See also llama.cpp server issue https://github.com/ggml-org/llama.cpp/issues/17673 Potentially related openwebui issue #19099 ### Steps to Reproduce I configured openwebui to "bypass embedding and retrival" and added a test knowledge base that only contains one document with the text of the poem "The Raven", configured this KB as "Use Entire Document" and intercepted the JSON requests using LiteLLM. Result as described, we see that the knowledge is always at the END of the JSON while the follow up requests are injected BEFORE thus the prefix is unstable and the common prefix drops towards zero for a long context KB (prefix only matching the system/user prompt with can be very short compared to the KB text potenitally consisting of several 10k of token): ## Initial request: { "model": "<redacted>", "stream": true, "messages": [ { "role": "system", "content": "You are an assistant helping to interpret poems." }, { "role": "user", "content": "# ### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id=\"1\">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> ta... (litellm_truncated skipped 6367 chars) ...e!\"<br/>Quoth the Raven, \"Nevermore.\"<br/><br/>\"Prophet!\" said I, \"thing of evil- prophet still, if bird or<br/>devil!<br/>By that Heaven that bends above us- by that God we both adore-<br/>Tell this soul with sorrow laden if, within the distant Aidenn,<br/>It shall clasp a sainted maiden whom the angels name Lenore-<br/>Clasp a rare and radiant maiden whom the angels name Lenore.\"<br/>Quoth the Raven, \"Nevermore.\"<br/><br/>\"Be that word our sign in parting, bird or fiend,\" I shrieked,<br/>upstarting-<br/>\"Get thee back into the tempest and the Night's Plutonian shore!<br/>Leave no black plume as a token of that lie thy soul hath spoken!<br/>Leave my loneliness unbroken!- quit the bust above my door!<br/>Take thy beak from out my heart, and take thy form from off my<br/>door!\"<br/>Quoth the Raven, \"Nevermore.\"<br/><br/>And the Raven, never flitting, still is sitting, still is sitting<br/>On the pallid bust of Pallas just above my chamber door;<br/>And his eyes have all the seeming of a demon's that is dreaming,<br/>And the lamplight o'er him streaming throws his shadow on the<br/>floor;<br/>And my soul from out that shadow that lies floating on the floor<br/>Shall be lifted- nevermore!</source>\n</context>\n\n<user_query>\nWhats the overall message of the poem?\n</user_query>\n\nWhats the overall message of the poem?" } ], "stream_options": { "include_usage": true } } ## Follow up request: { "model": "<redacted>", "stream": true, "messages": [ { "role": "system", "content": "You are an assistant helping to interpret poems." }, { "role": "user", "content": "Whats the overall message of the poem?" }, { "role": "assistant", "content": "The overall message of the poem \"The Raven\" by Edgar Allan Poe is one of profound grief, loss, and the inability to escape the haunting memories of a loved one. The narrator, overwhelmed by sorrow for his lost love Lenore, is visited by a raven that repeatedly utters the word \"Nevermore.\" This interaction underscores the narrator's despair and the inescapable nature of his sorrow. The raven serves as a symbol of the narrator's persistent and unyielding grief, emphasizing that there is no comfort or respite from his pain [1]." }, { "role": "user", "content": "# ### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id=\"1\">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> ta... (litellm_truncated skipped 6415 chars) ...Prophet!\" said I, \"thing of evil- prophet still, if bird or<br/>devil!<br/>By that Heaven that bends above us- by that God we both adore-<br/>Tell this soul with sorrow laden if, within the distant Aidenn,<br/>It shall clasp a sainted maiden whom the angels name Lenore-<br/>Clasp a rare and radiant maiden whom the angels name Lenore.\"<br/>Quoth the Raven, \"Nevermore.\"<br/><br/>\"Be that word our sign in parting, bird or fiend,\" I shrieked,<br/>upstarting-<br/>\"Get thee back into the tempest and the Night's Plutonian shore!<br/>Leave no black plume as a token of that lie thy soul hath spoken!<br/>Leave my loneliness unbroken!- quit the bust above my door!<br/>Take thy beak from out my heart, and take thy form from off my<br/>door!\"<br/>Quoth the Raven, \"Nevermore.\"<br/><br/>And the Raven, never flitting, still is sitting, still is sitting<br/>On the pallid bust of Pallas just above my chamber door;<br/>And his eyes have all the seeming of a demon's that is dreaming,<br/>And the lamplight o'er him streaming throws his shadow on the<br/>floor;<br/>And my soul from out that shadow that lies floating on the floor<br/>Shall be lifted- nevermore!</source>\n</context>\n\n<user_query>\nWhat are some of the literary devices Poe uses in 'The Raven'?\n</user_query>\n\nWhat are some of the literary devices Poe uses in 'The Raven'?" } ], "stream_options": { "include_usage": true } } ## 2nd Follow-up Request: { "model": "<redacted>", "stream": true, "messages": [ { "role": "system", "content": "You are an assistant helping to interpret poems." }, { "role": "user", "content": "Whats the overall message of the poem?" }, { "role": "assistant", "content": "The overall message of the poem \"The Raven\" by Edgar Allan Poe is one of profound grief, loss, and the inability to escape the haunting memories of a loved one. The narrator, overwhelmed by sorrow for his lost love Lenore, is visited by a raven that repeatedly utters the word \"Nevermore.\" This interaction underscores the narrator's despair and the inescapable nature of his sorrow. The raven serves as a symbol of the narrator's persistent and unyielding grief, emphasizing that there is no comfort or respite from his pain [1]." }, { "role": "user", "content": "What are some of the literary devices Poe uses in 'The Raven'?" }, { "role": "assistant", "content": "Edgar Allan Poe employs a variety of literary devices in \"The Raven\" to create its haunting and melancholic atmosphere. Some of the key literary devices include:\n\n1. **Alliteration**: Poe uses alliteration to create rhythm and musicality in the poem. For example, \"Deep into that darkness peering, long I stood there wondering, fearing\" [1].\n\n2. **Repetition**: The repetition of the word \"Nevermore\" by the raven creates a sense of inevitability and despair. This repetition also reinforces the theme of loss and the inability to escape grief [1].\n\n3. **Symbolism**: The raven itself is a powerful symbol of grief, loss, and the inescapable nature of sorrow. The bird's repeated utterance of \"Nevermore\" symbolizes the narrator's acceptance of his eternal grief [1].\n\n4. **Imagery**: Poe uses vivid and evocative imagery to create a sense of atmosphere and mood. For example, \"Ah, distinctly I remember it was in the bleak December\" sets a somber and bleak tone [1].\n\n5. **Personification**: The raven is personified as a visitor and a prophet, which adds to the eerie and supernatural atmosphere of the poem [1].\n\n6. **Internal Rhyme**: Poe uses internal rhyme to create a musical and rhythmic effect. For example, \"Once upon a midnight dreary, while I pondered, weak and weary\" [1].\n\n7. **Refrain**: The repeated use of the phrase \"Nevermore\" serves as a refrain, reinforcing the poem's themes and creating a sense of inevitability [1].\n\nThese literary devices work together to create the poem's haunting and melancholic atmosphere, emphasizing the narrator's profound grief and the inescapable nature of his sorrow." }, { "role": "user", "content": "# ### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id=\"1\">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> ta... (litellm_truncated skipped 6433 chars) ...\"thing of evil- prophet still, if bird or<br/>devil!<br/>By that Heaven that bends above us- by that God we both adore-<br/>Tell this soul with sorrow laden if, within the distant Aidenn,<br/>It shall clasp a sainted maiden whom the angels name Lenore-<br/>Clasp a rare and radiant maiden whom the angels name Lenore.\"<br/>Quoth the Raven, \"Nevermore.\"<br/><br/>\"Be that word our sign in parting, bird or fiend,\" I shrieked,<br/>upstarting-<br/>\"Get thee back into the tempest and the Night's Plutonian shore!<br/>Leave no black plume as a token of that lie thy soul hath spoken!<br/>Leave my loneliness unbroken!- quit the bust above my door!<br/>Take thy beak from out my heart, and take thy form from off my<br/>door!\"<br/>Quoth the Raven, \"Nevermore.\"<br/><br/>And the Raven, never flitting, still is sitting, still is sitting<br/>On the pallid bust of Pallas just above my chamber door;<br/>And his eyes have all the seeming of a demon's that is dreaming,<br/>And the lamplight o'er him streaming throws his shadow on the<br/>floor;<br/>And my soul from out that shadow that lies floating on the floor<br/>Shall be lifted- nevermore!</source>\n</context>\n\n<user_query>\nCan you explain the significance of the raven's appearance in the poem?\n</user_query>\n\nCan you explain the significance of the raven's appearance in the poem?" } ], "stream_options": { "include_usage": true } } ### Logs & Screenshots See steps to reproduce ### Additional Information Inject knwoledge always at the **beginning**, immediatly after the system prompt and keep it there at all costs.
GiteaMirror added the bugconfirmed issue labels 2026-05-05 21:26:21 -05:00
Author
Owner

@owui-terminator[bot] commented on GitHub (Dec 12, 2025):

🔍 Similar Issues Found

I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:

  1. #19752 issue: minor UI Bug: knowledge sharing
    by mahenning • Dec 04, 2025 • bug

  2. #19864 issue:
    by Haervwe • Dec 10, 2025 • bug

  3. #19777 issue:
    by Yaute7 • Dec 05, 2025 • bug

  4. #19861 issue:
    by QuitHub • Dec 10, 2025 • bug

  5. #19877 issue:
    by dotmobo • Dec 11, 2025 • bug

Show 5 more related issues
  1. #19563 issue:
    by naruto7g • Nov 28, 2025 • bug

  2. #19491 issue: Referencing Knowledge in chats with '#' no longer shows which knowledge base does the file belong to
    by donatas-xyz • Nov 25, 2025 • bug

  3. #18158 issue: Knowledge Base ignores embedding and loads full sources
    by Polak149 • Oct 08, 2025 • bug

  4. #19211 issue:
    by Byrnes9 • Nov 16, 2025 • bug

  5. #16794 issue: When "Knowledge" is disabled for users, it is still possible to add "Knowledge" to chat Folders
    by jsweetzer-ea • Aug 21, 2025 • bug


💡 Tips:

  • If this is a duplicate, please consider closing this issue and adding any additional details to the existing one
  • If you found a solution in any of these issues, please share it here to help others

This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.

<!-- gh-comment-id:3645422361 --> @owui-terminator[bot] commented on GitHub (Dec 12, 2025): 🔍 **Similar Issues Found** I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions: 1. [#19752](https://github.com/open-webui/open-webui/issues/19752) **issue: minor UI Bug: knowledge sharing** *by mahenning • Dec 04, 2025 • `bug`* 2. [#19864](https://github.com/open-webui/open-webui/issues/19864) **issue:** *by Haervwe • Dec 10, 2025 • `bug`* 3. [#19777](https://github.com/open-webui/open-webui/issues/19777) **issue:** *by Yaute7 • Dec 05, 2025 • `bug`* 4. [#19861](https://github.com/open-webui/open-webui/issues/19861) **issue:** *by QuitHub • Dec 10, 2025 • `bug`* 5. [#19877](https://github.com/open-webui/open-webui/issues/19877) **issue:** *by dotmobo • Dec 11, 2025 • `bug`* <details> <summary>Show 5 more related issues</summary> 6. [#19563](https://github.com/open-webui/open-webui/issues/19563) **issue:** *by naruto7g • Nov 28, 2025 • `bug`* 7. [#19491](https://github.com/open-webui/open-webui/issues/19491) **issue: Referencing Knowledge in chats with '#' no longer shows which knowledge base does the file belong to** *by donatas-xyz • Nov 25, 2025 • `bug`* 8. [#18158](https://github.com/open-webui/open-webui/issues/18158) **issue: Knowledge Base ignores embedding and loads full sources** *by Polak149 • Oct 08, 2025 • `bug`* 9. [#19211](https://github.com/open-webui/open-webui/issues/19211) **issue:** *by Byrnes9 • Nov 16, 2025 • `bug`* 10. [#16794](https://github.com/open-webui/open-webui/issues/16794) **issue: When "Knowledge" is disabled for users, it is still possible to add "Knowledge" to chat Folders** *by jsweetzer-ea • Aug 21, 2025 • `bug`* </details> --- 💡 **Tips:** - If this is a duplicate, please consider closing this issue and adding any additional details to the existing one - If you found a solution in any of these issues, please share it here to help others *This comment was generated automatically by a bot.* Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
Author
Owner

@silentoplayz commented on GitHub (Dec 16, 2025):

I had Gemini 3 Pro Preview help to test this issue out on my end by having it look at my Ollama logs and it believes that I have reproduced this issue. Here's how/why:

Gemini 3 Pro Preview's response:

Based on the debug logs you provided, I can confirm that the bug is happening.

Your system is not using the cached document for the second message. It is re-reading the entire file, which confirms that the text order has changed (the huge file likely moved to the end of the prompt).

Here is the evidence from your logs:

1. The "Smoking Gun" Log Entry

Look at the request at 01:10:40. This is your follow-up message ("Tell me more").

time=2025-12-16T01:10:40.730Z ... msg="completion request" ... prompt=14048 ...
time=2025-12-16T01:10:40.737Z ... msg="loading cache slot" id=0 cache=720 prompt=3906 ...
  • prompt=14048: This is the total size of the request (Chat History + The File).
  • cache=720: This is how much of the previous context Ollama found in memory. This is the failure.
    • If the file (12k+ tokens) was still at the start of the prompt, this number should be roughly 12000 or 13000.
    • Since it is only 720, it means only the System Prompt (instructions) matched. The massive file content did not match the cache, meaning it was moved to a different position in the JSON array (likely the end).

2. The Timing Evidence

  • First Request (01:10:12): Took ~4.0 seconds to process the file and reply.
  • Second Request (01:10:40): Took ~4.9 seconds to reply.

If the cache were working (i.e., if the bug were fixed), the second request should have taken less than 1 second to start generating, because the heavy lifting (reading the file) was already done. Instead, it did all the work again.

Conclusion

The issue is reproducible on your setup. Open WebUI is appending the document to the end of the chat history on follow-up messages, which invalidates the KV Cache in Ollama, forcing a full re-process every time you talk.

To fix this locally (until the devs patch it):
You would need to modify the Open WebUI source code (specifically looking for where payload.messages is constructed in the backend) to ensure the file content remains inserted at messages[0] or messages[1], rather than appended to the end.

2nd attempt verdict (as a screenshot):

Image
<!-- gh-comment-id:3658281048 --> @silentoplayz commented on GitHub (Dec 16, 2025): I had Gemini 3 Pro Preview help to test this issue out on my end by having it look at my Ollama logs and it believes that I have reproduced this issue. Here's how/why: ### Gemini 3 Pro Preview's response: Based on the debug logs you provided, I can **confirm that the bug is happening.** Your system is **not** using the cached document for the second message. It is re-reading the entire file, which confirms that the text order has changed (the huge file likely moved to the end of the prompt). Here is the evidence from your logs: ### 1. The "Smoking Gun" Log Entry Look at the request at `01:10:40`. This is your follow-up message ("Tell me more"). ```text time=2025-12-16T01:10:40.730Z ... msg="completion request" ... prompt=14048 ... time=2025-12-16T01:10:40.737Z ... msg="loading cache slot" id=0 cache=720 prompt=3906 ... ``` * **`prompt=14048`**: This is the total size of the request (Chat History + The File). * **`cache=720`**: This is how much of the previous context Ollama found in memory. **This is the failure.** * If the file (12k+ tokens) was still at the **start** of the prompt, this number should be roughly `12000` or `13000`. * Since it is only `720`, it means only the System Prompt (instructions) matched. The massive file content did not match the cache, meaning it was moved to a different position in the JSON array (likely the end). ### 2. The Timing Evidence * **First Request (01:10:12):** Took **~4.0 seconds** to process the file and reply. * **Second Request (01:10:40):** Took **~4.9 seconds** to reply. If the cache were working (i.e., if the bug were fixed), the second request should have taken **less than 1 second** to start generating, because the heavy lifting (reading the file) was already done. Instead, it did all the work again. ### Conclusion **The issue is reproducible on your setup.** Open WebUI is appending the document to the **end** of the chat history on follow-up messages, which invalidates the KV Cache in Ollama, forcing a full re-process every time you talk. **To fix this locally (until the devs patch it):** You would need to modify the Open WebUI source code (specifically looking for where `payload.messages` is constructed in the backend) to ensure the file content remains inserted at `messages[0]` or `messages[1]`, rather than appended to the end. 2nd attempt verdict (as a screenshot): <img width="1047" height="1082" alt="Image" src="https://github.com/user-attachments/assets/60a4ae98-3306-4acc-9ea8-a1257e9ab746" />
Author
Owner

@adhusch commented on GitHub (Dec 17, 2025):

Maybe you could consider adding the static knowledge as system message instead of appending it to a user message, as in the current middleware logic the system messages seem to go on top all the time (beeing inserted instead of appended) while user messages are alwayx appended at the end of the chat list (with potentially having content appended to them). The "append" flag name is a bit confusing, as it seems to only control "appending" (meaning post fix concatentation to a string) to a message, but_not_ appending vs inserting to the message list, which is statically decided based on the role system vs. user.

<!-- gh-comment-id:3664039055 --> @adhusch commented on GitHub (Dec 17, 2025): Maybe you could consider adding the static knowledge as system message instead of appending it to a user message, as in the current middleware logic the system messages seem to go on top all the time (beeing inserted instead of appended) while user messages are alwayx appended at the end of the chat list (with potentially having content appended to them). The "append" flag name is a bit confusing, as it seems to only control "appending" (meaning post fix concatentation to a string) to a _message,_ but_not_ appending vs inserting to the message _list_, which is statically decided based on the role system vs. user.
Author
Owner

@Podden commented on GitHub (Dec 22, 2025):

I saw this behaviour in earlier versions as well. Before, it was added to the System promt.

But for chunked retrieval, this new behaviour is actually great for caching! I can check if the conversation is using RAG and just exclude the newest message from being cached (eg. In my anthropic pipe). But for the entire document content, adding to the System promt would be more appropiate. Problem is, as pipe dev, AFAIK I currently cant tell the difference.

I dont have a reliable way to know IF the user is referencing knowledge or a file as well as knowing which retrieval mode for which reference. I would love to be able to handle All the different cases in a cache friendly manner. Maybe I do not want to content to be added at All because I want to attach the entire pdf file as base64 or something. Please add metadata for this or something.

Easiest fix for now:
Chunked retrieval = add to newest message
Entire context/Bypass embedding = add to System promt like before

<!-- gh-comment-id:3684324202 --> @Podden commented on GitHub (Dec 22, 2025): I saw this behaviour in earlier versions as well. Before, it was added to the System promt. But for chunked retrieval, this new behaviour is actually great for caching! I can check if the conversation is using RAG and just exclude the newest message from being cached (eg. In my anthropic pipe). But for the entire document content, adding to the System promt would be more appropiate. Problem is, as pipe dev, AFAIK I currently cant tell the difference. I dont have a reliable way to know IF the user is referencing knowledge or a file as well as knowing which retrieval mode for which reference. I would love to be able to handle All the different cases in a cache friendly manner. Maybe I do not want to content to be added at All because I want to attach the entire pdf file as base64 or something. Please add metadata for this or something. Easiest fix for now: Chunked retrieval = add to newest message Entire context/Bypass embedding = add to System promt like before
Author
Owner

@Classic298 commented on GitHub (Dec 30, 2025):

Strongly related: https://github.com/open-webui/open-webui/issues/12459

<!-- gh-comment-id:3699258071 --> @Classic298 commented on GitHub (Dec 30, 2025): Strongly related: https://github.com/open-webui/open-webui/issues/12459
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#57700