[GH-ISSUE #23269] issue: Pyodide prompt injection constantly poisons the token cache (native tool calling) #19937

New Issue

GiteaMirror · 2026-04-20T02:29:48-05:00

GiteaMirror commented

2026-04-20 02:29:48 -05:00

Originally created by @arbv on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23269

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.8.12

Ollama Version (if applicable)

No response

Operating System

NixOS

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

CODE_INTERPRETER_PYODIDE_PROMPT is static content that does not change between turns or sessions. It should be appended to the system prompt once, where static instructional context belongs architecturally. This makes it part of the stable cached prefix that LLM providers can reuse across turns without re-billing.

Actual Behavior

CODE_INTERPRETER_PYODIDE_PROMPT is appended ephemerally to the last user message at request assembly time and is not persisted to conversation storage. This happens on every turn whenever the Pyodide engine is active, including when native tool calling mode is enabled, not only during active code execution.

Because the injection is ephemeral, the following mismatch occurs on every turn:

Turn N is sent to the API as [...history, userN + PYODIDE_PROMPT] and written to the provider cache.
Turn N+1 presents [...history, userN_clean, assistantN, userN+1 + PYODIDE_PROMPT], where userN_clean does not match the cached userN + PYODIDE_PROMPT.

The prefix mismatch starts at userN, invalidating the entire accumulated conversation history. The full conversation is charged at regular input token price instead of the much cheaper cache read rate on every single turn.

Steps to Reproduce

Enable the Code Interpreter (configured to use Pyodide engine) in settings. Set native tool calling for the active model.
Start a multi-turn conversation with any model on a provider that supports prefix caching (GPT, Gemini).
Observe guaranteed cache miss tokens in provider usage logs on every subsequent turn, regardless of whether code execution was invoked.
Disable the Code Interpreter in the chat window and make few more turns in short succession. You will hit the cache (and save some cost).

It is especially annoying in long conversations.

Logs & Screenshots

N/A, but here are the links to the relevant parts of the code:

The offending prompt itself: 9bd84258d0/backend/open_webui/config.py (L2142)
The code which appends it: 9bd84258d0/backend/open_webui/utils/middleware.py (L2375)

Additional Information

This issue is specific to the Pyodide engine. The Jupyter does not exhibit this behaviour. Tool definitions themselves are also correctly placed in a stable position and do not cause this problem.

The fix is to append CODE_INTERPRETER_PYODIDE_PROMPT to the system prompt when the Pyodide engine is active, consistent with how tool definitions are handled.

Originally created by @arbv on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23269 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.8.12 ### Ollama Version (if applicable) _No response_ ### Operating System NixOS ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior `CODE_INTERPRETER_PYODIDE_PROMPT` is static content that does not change between turns or sessions. It should be appended to the system prompt once, where static instructional context belongs architecturally. This makes it part of the stable cached prefix that LLM providers can reuse across turns without re-billing. ### Actual Behavior `CODE_INTERPRETER_PYODIDE_PROMPT` is appended ephemerally to the last user message at request assembly time and is not persisted to conversation storage. This happens on every turn whenever the Pyodide engine is active, including when native tool calling mode is enabled, not only during active code execution. Because the injection is ephemeral, the following mismatch occurs on every turn: - Turn N is sent to the API as `[...history, userN + PYODIDE_PROMPT]` and written to the provider cache. - Turn N+1 presents `[...history, userN_clean, assistantN, userN+1 + PYODIDE_PROMPT]`, where `userN_clean` does not match the cached `userN + PYODIDE_PROMPT`. The prefix mismatch starts at `userN`, invalidating the entire accumulated conversation history. The full conversation is charged at regular input token price instead of the much cheaper cache read rate on every single turn. ### Steps to Reproduce 1. Enable the Code Interpreter (configured to use Pyodide engine) in settings. Set native tool calling for the active model. 2. Start a multi-turn conversation with any model on a provider that supports prefix caching (GPT, Gemini). 3. Observe **guaranteed** cache miss tokens in provider usage logs on every subsequent turn, regardless of whether code execution was invoked. 4. Disable the Code Interpreter in the chat window and make few more turns in short succession. You will hit the cache (and save some cost). It is especially annoying in long conversations. ### Logs & Screenshots N/A, but here are the links to the relevant parts of the code: 1. The offending prompt itself: https://github.com/open-webui/open-webui/blob/9bd84258d09eefe7bf975878fb0e31a5dadfe0f8/backend/open_webui/config.py#L2142 2. The code which appends it: https://github.com/open-webui/open-webui/blob/9bd84258d09eefe7bf975878fb0e31a5dadfe0f8/backend/open_webui/utils/middleware.py#L2375 ### Additional Information This issue is specific to the Pyodide engine. The Jupyter does not exhibit this behaviour. Tool definitions themselves are also correctly placed in a stable position and do not cause this problem. The fix is to append `CODE_INTERPRETER_PYODIDE_PROMPT` to the system prompt when the Pyodide engine is active, consistent with how tool definitions are handled.

GiteaMirror added the bug label 2026-04-20 02:29:48 -05:00

GiteaMirror closed this issue

2026-04-20 02:29:49 -05:00

GiteaMirror commented

2026-04-20 02:29:51 -05:00

@arbv commented on GitHub (Mar 31, 2026):

The ability to effectively set the CODE_INTERPRETER_PYODIDE_PROMPT to "" via an env variable would be a solution as well (although, a half-baked one). IMO, the prompt content belongs to the system prompt where is should be added alongside the native tool definitions.

Note: for non-native tool calling the current approach (the most recent user message annotation) is understandable and acceptable.

@arbv commented on GitHub (Mar 31, 2026): The ability to effectively set the `CODE_INTERPRETER_PYODIDE_PROMPT` to `""` via an env variable would be a solution as well (although, a half-baked one). IMO, the prompt content belongs to the system prompt where is should be added alongside the native tool definitions. Note: for non-native tool calling the current approach (the most recent user message annotation) is understandable and acceptable.

GiteaMirror commented

2026-04-20 02:29:51 -05:00

@arbv commented on GitHub (Mar 31, 2026):

Also, on a slightly related note:

The same historical prefix invalidation occurs when chatting with documents, where the RAG template is injected ephemerally into the last user message. Unlike the Pyodide prompt, RAG content is dynamic per query and cannot be relocated to the system prompt. However, the ephemeral injection still invalidates the cached prefix from the previous turn onward, meaning the accumulated conversation history is re-billed from scratch on every turn where RAG is active.

Something to think about, I guess. Chatting with documents should not wrap the user message in native toll calling mode for the same reason. Though - is is a completely different, albeit vaguely related, issue.

@arbv commented on GitHub (Mar 31, 2026): Also, on a slightly related note: The same historical prefix invalidation occurs when chatting with documents, where the RAG template is injected ephemerally into the last user message. Unlike the Pyodide prompt, RAG content is dynamic per query and cannot be relocated to the system prompt. However, the ephemeral injection still invalidates the cached prefix from the previous turn onward, meaning the accumulated conversation history is re-billed from scratch on every turn where RAG is active. Something to think about, I guess. Chatting with documents should not wrap the user message in native toll calling mode for the same reason. Though - is is a completely different, albeit vaguely related, issue.

GiteaMirror commented

2026-04-20 02:29:52 -05:00

@trevorhayes6561-maker commented on GitHub (Mar 31, 2026):

This makes sense — injecting the Pyodide prompt into each user message
breaks prefix caching and unnecessarily increases cost. Moving it to the
system prompt when the engine is active would align with how tool
definitions are handled and avoid cache invalidation. The env override
could help short-term, but fixing the placement seems like the right
long-term solution.

On Tue, Mar 31, 2026, 1:25 PM Artem Boldariev @.***>
wrote:

arbv left a comment (open-webui/open-webui#23269)
https://github.com/open-webui/open-webui/issues/23269#issuecomment-4164221152

Also, on a slightly related note:

The same historical prefix invalidation occurs when chatting with
documents, where the RAG template is injected ephemerally into the last
user message. Unlike the Pyodide prompt, RAG content is dynamic per query
and cannot be relocated to the system prompt. However, the ephemeral
injection still invalidates the cached prefix from the previous turn
onward, meaning the accumulated conversation history is re-billed from
scratch on every turn where RAG is active.

Something to think about, I guess. Chatting with documents should not wrap
the user message for the same reason. Though - is is a completely
different, albeit vaguely related, issue.

—
Reply to this email directly, view it on GitHub
https://github.com/open-webui/open-webui/issues/23269?email_source=notifications&email_token=B7HYZ6POCEBJKYOVC2OKKF34TP5RPA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJWGQZDEMJRGUZKM4TFMFZW63VKON2WE43DOJUWEZLEUVSXMZLOOSWGM33PORSXEX3DNRUWG2Y#issuecomment-4164221152,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/B7HYZ6KTRGURUS2G2D3QEHL4TP5RPAVCNFSM6AAAAACXIBSAFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCNRUGIZDCMJVGI
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

@trevorhayes6561-maker commented on GitHub (Mar 31, 2026): This makes sense — injecting the Pyodide prompt into each user message breaks prefix caching and unnecessarily increases cost. Moving it to the system prompt when the engine is active would align with how tool definitions are handled and avoid cache invalidation. The env override could help short-term, but fixing the placement seems like the right long-term solution. On Tue, Mar 31, 2026, 1:25 PM Artem Boldariev ***@***.***> wrote: > *arbv* left a comment (open-webui/open-webui#23269) > <https://github.com/open-webui/open-webui/issues/23269#issuecomment-4164221152> > > Also, on a slightly related note: > > The same historical prefix invalidation occurs when chatting with > documents, where the RAG template is injected ephemerally into the last > user message. Unlike the Pyodide prompt, RAG content is dynamic per query > and cannot be relocated to the system prompt. However, the ephemeral > injection still invalidates the cached prefix from the previous turn > onward, meaning the accumulated conversation history is re-billed from > scratch on every turn where RAG is active. > > Something to think about, I guess. Chatting with documents should not wrap > the user message for the same reason. Though - is is a completely > different, albeit vaguely related, issue. > > — > Reply to this email directly, view it on GitHub > <https://github.com/open-webui/open-webui/issues/23269?email_source=notifications&email_token=B7HYZ6POCEBJKYOVC2OKKF34TP5RPA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJWGQZDEMJRGUZKM4TFMFZW63VKON2WE43DOJUWEZLEUVSXMZLOOSWGM33PORSXEX3DNRUWG2Y#issuecomment-4164221152>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/B7HYZ6KTRGURUS2G2D3QEHL4TP5RPAVCNFSM6AAAAACXIBSAFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCNRUGIZDCMJVGI> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >

GiteaMirror commented

2026-04-20 02:29:54 -05:00

@tjbck commented on GitHub (Apr 1, 2026):

Refactored to use system prompt in the latest dev.

@tjbck commented on GitHub (Apr 1, 2026): Refactored to use system prompt in the latest dev.

GiteaMirror commented

2026-04-20 02:29:54 -05:00

@arbv commented on GitHub (Apr 1, 2026):

@tjbck Thank you!

@arbv commented on GitHub (Apr 1, 2026): @tjbck Thank you!

GiteaMirror referenced this issue

2026-04-20 05:55:20 -05:00