[PR #20317] [MERGED] fix: inject full context knowledge into system message for KV prefix caching #48613

Closed
opened 2026-04-30 00:37:27 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/20317
Author: @Classic298
Created: 1/1/2026
Status: Merged
Merged: 1/5/2026
Merged by: @tjbck

Base: devHead: kv-cache


📝 Commits (5)

📊 Changes

2 files changed (+27 additions, -9 deletions)

View changed files

📝 backend/open_webui/env.py (+4 -0)
📝 backend/open_webui/utils/middleware.py (+23 -9)

📄 Description

PR Description

Summary

Fixes #20301 - Knowledge provided as "entire document" / "bypass embedding" is now injected into the system message instead of the last user message, enabling KV prefix caching to work correctly.

Problem

Previously, full document knowledge was always appended to the last user message. Since the user message changes every turn, the knowledge block constantly shifted positions in the prompt, invalidating the KV cache and forcing the model to re-process the entire document on every interaction.

Solution

  • Static knowledge (full context / entire document mode) is now injected into the system message, which remains stable at the start of the conversation
  • Dynamic content (web search results, tool outputs) continues to be injected into the user message to avoid caching stale data
  • Uses rag_template with an empty prompt parameter to preserve formatting instructions while keeping the system message static across turns

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/20317 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 1/1/2026 **Status:** ✅ Merged **Merged:** 1/5/2026 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `kv-cache` --- ### 📝 Commits (5) - [`3247d82`](https://github.com/open-webui/open-webui/commit/3247d826a5fda4f85c4b5f0816cf19bee0184b90) Update middleware.py - [`0fef6e6`](https://github.com/open-webui/open-webui/commit/0fef6e652b1df81a1daebe42731674a18c765fc0) Update middleware.py - [`ecd9c73`](https://github.com/open-webui/open-webui/commit/ecd9c73e734dddc84c8d0c7a20d506fef98ad4fa) env var - [`1e197b4`](https://github.com/open-webui/open-webui/commit/1e197b47d293719dde54edc321905b216a2c25e7) address - [`54e490c`](https://github.com/open-webui/open-webui/commit/54e490c7c455106f59f8fe541492f1edbfa17de9) upd ### 📊 Changes **2 files changed** (+27 additions, -9 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/env.py` (+4 -0) 📝 `backend/open_webui/utils/middleware.py` (+23 -9) </details> ### 📄 Description ## PR Description ### Summary Fixes #20301 - Knowledge provided as "entire document" / "bypass embedding" is now injected into the system message instead of the last user message, enabling KV prefix caching to work correctly. ### Problem Previously, full document knowledge was always appended to the last user message. Since the user message changes every turn, the knowledge block constantly shifted positions in the prompt, invalidating the KV cache and forcing the model to re-process the entire document on every interaction. ### Solution - Static knowledge (full context / entire document mode) is now injected into the **system message**, which remains stable at the start of the conversation - Dynamic content (web search results, tool outputs) continues to be injected into the user message to avoid caching stale data - Uses rag_template with an empty prompt parameter to preserve formatting instructions while keeping the system message static across turns ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-30 00:37:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#48613