MAJOR BUG: system prompt can get duplicated when calling several tools in one call #4039

New Issue

GiteaMirror · 2025-11-11T15:45:00-06:00

GiteaMirror commented

2025-11-11 15:45:00 -06:00

Originally created by @thiswillbeyourgithub on GitHub (Feb 21, 2025).

Bug Report

Installation Method

Docker

Environment

Open WebUI Version: 0.5.16

Confirmation:

[ x ] I have read and followed all the instructions provided in the README.md.
[ x ] I am on the latest version of both Open WebUI and Ollama.
[ x ] I have included the browser console logs.
[ x ] I have included the Docker container logs.
[ x ] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

No messages should get duplicated

Actual Behavior:

I see in langfuse that my system messages gets replicated in the same message:

THIS IS A SYSTEM PROMPT
THIS IS A SYSTEM PROMPT

Description

Bug Summary:
I was investigating unexpectedly high inference costs using langfuse and noticed that some message had implausibly long system messages. It seems like when an LLM is doing several tool calls in a single message, it can cause the system message content to get duplicated.

Reproduction Details

Steps to Reproduce:

I am extremely sorry but I am not aware to fully reproduce the issue but noticed it even with a fresh model and without filters and using regular tools like the calculator tool

Given the extreme importance of this issue it seemed like the good thing to do was to open up an issue early instead of waiting a few days for me to be able to confidently reproduce it.

My intuition is that this happens when we have a default model set up (maybe without system prompt) and then on top of that one define a user model with a system prompt and do the tool calling on that user model. It seems like asking that model for several tool calls in one llm turn triggers it. I am using litellm as a backend and tried deactivating all filters and customization to confirm the issue was not on my side.

Perusing langfuse it seems like my 2000 tokens long prompt got deduplicated up to 10 times and I was definitely not expecting so expensive calls.

I don't know in which version that started happening.

It seems critical to fix the langfuse pipes and filters that apparently broke recently to help other users debug this.

Unfortunately I don't have the time to investigate this much more but hope this is useful.

Originally created by @thiswillbeyourgithub on GitHub (Feb 21, 2025). # Bug Report ## Installation Method Docker ## Environment - **Open WebUI Version:** 0.5.16 **Confirmation:** - [ x ] I have read and followed all the instructions provided in the README.md. - [ x ] I am on the latest version of both Open WebUI and Ollama. - [ x ] I have included the browser console logs. - [ x ] I have included the Docker container logs. - [ x ] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: No messages should get duplicated ## Actual Behavior: I see in langfuse that my system messages gets replicated in the same message: ``` THIS IS A SYSTEM PROMPT THIS IS A SYSTEM PROMPT ``` ## Description **Bug Summary:** I was investigating unexpectedly high inference costs using langfuse and noticed that some message had implausibly long system messages. It seems like when an LLM is doing several tool calls in a single message, it can cause the system message content to get duplicated. ## Reproduction Details **Steps to Reproduce:** I am extremely sorry but I am not aware to fully reproduce the issue but noticed it even with a fresh model and without filters and using regular tools like the calculator tool Given the extreme importance of this issue it seemed like the good thing to do was to open up an issue early instead of waiting a few days for me to be able to confidently reproduce it. My intuition is that this happens when we have a default model set up (maybe without system prompt) and then on top of that one define a user model with a system prompt and do the tool calling on that user model. It seems like asking that model for several tool calls in one llm turn triggers it. I am using litellm as a backend and tried deactivating all filters and customization to confirm the issue was not on my side. Perusing langfuse it seems like my 2000 tokens long prompt got deduplicated up to 10 times and I was definitely not expecting so expensive calls. I don't know in which version that started happening. It seems critical to fix the langfuse pipes and filters that apparently broke recently to help other users debug this. Unfortunately I don't have the time to investigate this much more but hope this is useful.

GiteaMirror closed this issue

2025-11-11 15:45:00 -06:00

GiteaMirror referenced this issue

2025-11-11 17:47:09 -06:00

[PR #4039] [MERGED] chore(deps): bump rapidocr-onnxruntime from 1.3.22 to 1.3.24 in /backend #8183

GiteaMirror referenced this issue

2026-04-20 03:28:34 -05:00

[PR #4039] [MERGED] chore(deps): bump rapidocr-onnxruntime from 1.3.22 to 1.3.24 in /backend #21387

GiteaMirror referenced this issue

2026-04-25 10:40:43 -05:00

[PR #4039] [MERGED] chore(deps): bump rapidocr-onnxruntime from 1.3.22 to 1.3.24 in /backend #37017

GiteaMirror referenced this issue