mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[GH-ISSUE #19905] issue: Knowlege provided as "entire document" / "bypass embedding" is always injected at the END of chat flow, breaking KV prefix caching #57700
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @adhusch on GitHub (Dec 12, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19905
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.6.41
Ollama Version (if applicable)
N/A
Operating System
N/A
Browser (if applicable)
N/A
Confirmation
README.md.Expected Behavior
When using "bypass embedding and retrival" and/or "using entire docment" mode for knowledge added to a model, this knowledge should be included in the first message of the (JSON) chat flow and remain at the first position. By that prefix KV caching can work as expected, the potentially large knowledge part remains cached, and follow up questions need token generation only for the small follow up question part.
Actual Behavior
Since some time, the knowlege in "using entire document" mode is addeds as the LAST element of the chat JSON, follow up question are injected BEFORE. By that any resonable KV caching is broken as the chat prefix changes with each follow up and chat response is slow as the whole knowlege has to be reprocessed. See minimal example below.
See also llama.cpp server issue https://github.com/ggml-org/llama.cpp/issues/17673
Potentially related openwebui issue #19099
Steps to Reproduce
I configured openwebui to "bypass embedding and retrival" and added a test knowledge base that only contains one document with the text of the poem "The Raven", configured this KB as "Use Entire Document" and intercepted the JSON requests using LiteLLM. Result as described, we see that the knowledge is always at the END of the JSON while the follow up requests are injected BEFORE thus the prefix is unstable and the common prefix drops towards zero for a long context KB (prefix only matching the system/user prompt with can be very short compared to the KB text potenitally consisting of several 10k of token):
Initial request:
{
"model": "",
"stream": true,
"messages": [
{
"role": "system",
"content": "You are an assistant helping to interpret poems."
},
{
"role": "user",
"content": "# ### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., <source id="1">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the ta... (litellm_truncated skipped 6367 chars) ...e!"
Quoth the Raven, "Nevermore."
"Prophet!" said I, "thing of evil- prophet still, if bird or
devil!
By that Heaven that bends above us- by that God we both adore-
Tell this soul with sorrow laden if, within the distant Aidenn,
It shall clasp a sainted maiden whom the angels name Lenore-
Clasp a rare and radiant maiden whom the angels name Lenore."
Quoth the Raven, "Nevermore."
"Be that word our sign in parting, bird or fiend," I shrieked,
upstarting-
"Get thee back into the tempest and the Night's Plutonian shore!
Leave no black plume as a token of that lie thy soul hath spoken!
Leave my loneliness unbroken!- quit the bust above my door!
Take thy beak from out my heart, and take thy form from off my
door!"
Quoth the Raven, "Nevermore."
And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamplight o'er him streaming throws his shadow on the
floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted- nevermore!\n\n\n<user_query>\nWhats the overall message of the poem?\n</user_query>\n\nWhats the overall message of the poem?"
}
],
"stream_options": {
"include_usage": true
}
}
Follow up request:
{
"model": "",
"stream": true,
"messages": [
{
"role": "system",
"content": "You are an assistant helping to interpret poems."
},
{
"role": "user",
"content": "Whats the overall message of the poem?"
},
{
"role": "assistant",
"content": "The overall message of the poem "The Raven" by Edgar Allan Poe is one of profound grief, loss, and the inability to escape the haunting memories of a loved one. The narrator, overwhelmed by sorrow for his lost love Lenore, is visited by a raven that repeatedly utters the word "Nevermore." This interaction underscores the narrator's despair and the inescapable nature of his sorrow. The raven serves as a symbol of the narrator's persistent and unyielding grief, emphasizing that there is no comfort or respite from his pain [1]."
},
{
"role": "user",
"content": "# ### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., <source id="1">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the ta... (litellm_truncated skipped 6415 chars) ...Prophet!" said I, "thing of evil- prophet still, if bird or
devil!
By that Heaven that bends above us- by that God we both adore-
Tell this soul with sorrow laden if, within the distant Aidenn,
It shall clasp a sainted maiden whom the angels name Lenore-
Clasp a rare and radiant maiden whom the angels name Lenore."
Quoth the Raven, "Nevermore."
"Be that word our sign in parting, bird or fiend," I shrieked,
upstarting-
"Get thee back into the tempest and the Night's Plutonian shore!
Leave no black plume as a token of that lie thy soul hath spoken!
Leave my loneliness unbroken!- quit the bust above my door!
Take thy beak from out my heart, and take thy form from off my
door!"
Quoth the Raven, "Nevermore."
And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamplight o'er him streaming throws his shadow on the
floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted- nevermore!\n\n\n<user_query>\nWhat are some of the literary devices Poe uses in 'The Raven'?\n</user_query>\n\nWhat are some of the literary devices Poe uses in 'The Raven'?"
}
],
"stream_options": {
"include_usage": true
}
}
2nd Follow-up Request:
{
"model": "",
"stream": true,
"messages": [
{
"role": "system",
"content": "You are an assistant helping to interpret poems."
},
{
"role": "user",
"content": "Whats the overall message of the poem?"
},
{
"role": "assistant",
"content": "The overall message of the poem "The Raven" by Edgar Allan Poe is one of profound grief, loss, and the inability to escape the haunting memories of a loved one. The narrator, overwhelmed by sorrow for his lost love Lenore, is visited by a raven that repeatedly utters the word "Nevermore." This interaction underscores the narrator's despair and the inescapable nature of his sorrow. The raven serves as a symbol of the narrator's persistent and unyielding grief, emphasizing that there is no comfort or respite from his pain [1]."
},
{
"role": "user",
"content": "What are some of the literary devices Poe uses in 'The Raven'?"
},
{
"role": "assistant",
"content": "Edgar Allan Poe employs a variety of literary devices in "The Raven" to create its haunting and melancholic atmosphere. Some of the key literary devices include:\n\n1. Alliteration: Poe uses alliteration to create rhythm and musicality in the poem. For example, "Deep into that darkness peering, long I stood there wondering, fearing" [1].\n\n2. Repetition: The repetition of the word "Nevermore" by the raven creates a sense of inevitability and despair. This repetition also reinforces the theme of loss and the inability to escape grief [1].\n\n3. Symbolism: The raven itself is a powerful symbol of grief, loss, and the inescapable nature of sorrow. The bird's repeated utterance of "Nevermore" symbolizes the narrator's acceptance of his eternal grief [1].\n\n4. Imagery: Poe uses vivid and evocative imagery to create a sense of atmosphere and mood. For example, "Ah, distinctly I remember it was in the bleak December" sets a somber and bleak tone [1].\n\n5. Personification: The raven is personified as a visitor and a prophet, which adds to the eerie and supernatural atmosphere of the poem [1].\n\n6. Internal Rhyme: Poe uses internal rhyme to create a musical and rhythmic effect. For example, "Once upon a midnight dreary, while I pondered, weak and weary" [1].\n\n7. Refrain: The repeated use of the phrase "Nevermore" serves as a refrain, reinforcing the poem's themes and creating a sense of inevitability [1].\n\nThese literary devices work together to create the poem's haunting and melancholic atmosphere, emphasizing the narrator's profound grief and the inescapable nature of his sorrow."
},
{
"role": "user",
"content": "# ### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., <source id="1">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the ta... (litellm_truncated skipped 6433 chars) ..."thing of evil- prophet still, if bird or
devil!
By that Heaven that bends above us- by that God we both adore-
Tell this soul with sorrow laden if, within the distant Aidenn,
It shall clasp a sainted maiden whom the angels name Lenore-
Clasp a rare and radiant maiden whom the angels name Lenore."
Quoth the Raven, "Nevermore."
"Be that word our sign in parting, bird or fiend," I shrieked,
upstarting-
"Get thee back into the tempest and the Night's Plutonian shore!
Leave no black plume as a token of that lie thy soul hath spoken!
Leave my loneliness unbroken!- quit the bust above my door!
Take thy beak from out my heart, and take thy form from off my
door!"
Quoth the Raven, "Nevermore."
And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamplight o'er him streaming throws his shadow on the
floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted- nevermore!\n\n\n<user_query>\nCan you explain the significance of the raven's appearance in the poem?\n</user_query>\n\nCan you explain the significance of the raven's appearance in the poem?"
}
],
"stream_options": {
"include_usage": true
}
}
Logs & Screenshots
See steps to reproduce
Additional Information
Inject knwoledge always at the beginning, immediatly after the system prompt and keep it there at all costs.
@owui-terminator[bot] commented on GitHub (Dec 12, 2025):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#19752 issue: minor UI Bug: knowledge sharing
by mahenning • Dec 04, 2025 •
bug#19864 issue:
by Haervwe • Dec 10, 2025 •
bug#19777 issue:
by Yaute7 • Dec 05, 2025 •
bug#19861 issue:
by QuitHub • Dec 10, 2025 •
bug#19877 issue:
by dotmobo • Dec 11, 2025 •
bugShow 5 more related issues
#19563 issue:
by naruto7g • Nov 28, 2025 •
bug#19491 issue: Referencing Knowledge in chats with '#' no longer shows which knowledge base does the file belong to
by donatas-xyz • Nov 25, 2025 •
bug#18158 issue: Knowledge Base ignores embedding and loads full sources
by Polak149 • Oct 08, 2025 •
bug#19211 issue:
by Byrnes9 • Nov 16, 2025 •
bug#16794 issue: When "Knowledge" is disabled for users, it is still possible to add "Knowledge" to chat Folders
by jsweetzer-ea • Aug 21, 2025 •
bug💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
@silentoplayz commented on GitHub (Dec 16, 2025):
I had Gemini 3 Pro Preview help to test this issue out on my end by having it look at my Ollama logs and it believes that I have reproduced this issue. Here's how/why:
Gemini 3 Pro Preview's response:
Based on the debug logs you provided, I can confirm that the bug is happening.
Your system is not using the cached document for the second message. It is re-reading the entire file, which confirms that the text order has changed (the huge file likely moved to the end of the prompt).
Here is the evidence from your logs:
1. The "Smoking Gun" Log Entry
Look at the request at
01:10:40. This is your follow-up message ("Tell me more").prompt=14048: This is the total size of the request (Chat History + The File).cache=720: This is how much of the previous context Ollama found in memory. This is the failure.12000or13000.720, it means only the System Prompt (instructions) matched. The massive file content did not match the cache, meaning it was moved to a different position in the JSON array (likely the end).2. The Timing Evidence
If the cache were working (i.e., if the bug were fixed), the second request should have taken less than 1 second to start generating, because the heavy lifting (reading the file) was already done. Instead, it did all the work again.
Conclusion
The issue is reproducible on your setup. Open WebUI is appending the document to the end of the chat history on follow-up messages, which invalidates the KV Cache in Ollama, forcing a full re-process every time you talk.
To fix this locally (until the devs patch it):
You would need to modify the Open WebUI source code (specifically looking for where
payload.messagesis constructed in the backend) to ensure the file content remains inserted atmessages[0]ormessages[1], rather than appended to the end.2nd attempt verdict (as a screenshot):
@adhusch commented on GitHub (Dec 17, 2025):
Maybe you could consider adding the static knowledge as system message instead of appending it to a user message, as in the current middleware logic the system messages seem to go on top all the time (beeing inserted instead of appended) while user messages are alwayx appended at the end of the chat list (with potentially having content appended to them). The "append" flag name is a bit confusing, as it seems to only control "appending" (meaning post fix concatentation to a string) to a message, but_not_ appending vs inserting to the message list, which is statically decided based on the role system vs. user.
@Podden commented on GitHub (Dec 22, 2025):
I saw this behaviour in earlier versions as well. Before, it was added to the System promt.
But for chunked retrieval, this new behaviour is actually great for caching! I can check if the conversation is using RAG and just exclude the newest message from being cached (eg. In my anthropic pipe). But for the entire document content, adding to the System promt would be more appropiate. Problem is, as pipe dev, AFAIK I currently cant tell the difference.
I dont have a reliable way to know IF the user is referencing knowledge or a file as well as knowing which retrieval mode for which reference. I would love to be able to handle All the different cases in a cache friendly manner. Maybe I do not want to content to be added at All because I want to attach the entire pdf file as base64 or something. Please add metadata for this or something.
Easiest fix for now:
Chunked retrieval = add to newest message
Entire context/Bypass embedding = add to System promt like before
@Classic298 commented on GitHub (Dec 30, 2025):
Strongly related: https://github.com/open-webui/open-webui/issues/12459