[GH-ISSUE #21499] feat: support truncating chat messages for task models #35030

New Issue

GiteaMirror · 2026-04-25T09:14:24-05:00

GiteaMirror commented

2026-04-25 09:14:24 -05:00

Originally created by @daanknoope on GitHub (Feb 16, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/21499

Check Existing Issues

I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

The Open WebUI implementation for generating titles and tags is currently not optimised for conversations with very long chats (eg for when summarising documents or analysing code. The current implementation sends full chat messages to the task model. For a chat where the user pasted a large amount of text, this means the task model has to fully process that, only to generate a title/tag.

This has two concrete problems:

For local model users, it forces the task model to process too much information, causing long delays
For API-users, this can cause unnecessarily high bills (like in https://github.com/open-webui/open-webui/issues/15081, which has the same root cause).

This behaviour is caused by the implementation of the task model prompt template. While the template does allow for limiting the amount of messages to send to the task model, it does not allow limiting the size of the chat messages themselves. Therefore, a single very large chat would be sent straight to the task model, causing the delays.

Desired Solution you'd like

The current prompt templating for title generation allow the following options:

MESSAGES: all messages are inserted into the prompt
MESSAGES:START:n: the first n messages are inserted
MESSAGES:END:n: the last n messages are inserted
MESSAGES:MIDDLETRUNCATE:n: the first n/2 and last n/2 messages are inserted.

I would propose to extend the prompt template with an optional :MAXCHARS:n suffix, limiting the size of each message to at most n characters.

For example, MESSAGES:START:5:MAXCHARS:500 inserts the first 5 messages, and limits all messages individually to at most 500 characters.

The truncation should probably keep the beginning and end of messages, since prompts typically put intent and topic at those places.

Alternatives Considered

Using a smaller task model: when running locally this would require swapping the main model in and out of (V)RAM for follow up questions or would permanently occupy (V)RAM. Neither is ideal. For API-based users the cost would decrease, but token usage remains inefficient.

Additional Context

No response

Originally created by @daanknoope on GitHub (Feb 16, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/21499 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description The Open WebUI implementation for generating titles and tags is currently not optimised for conversations with very long chats (eg for when summarising documents or analysing code. The current implementation sends full chat messages to the task model. For a chat where the user pasted a large amount of text, this means the task model has to fully process that, only to generate a title/tag. This has two concrete problems: * For local model users, it forces the task model to process too much information, causing long delays * For API-users, this can cause unnecessarily high bills (like in https://github.com/open-webui/open-webui/issues/15081, which has the same root cause). This behaviour is caused by the implementation of the task model prompt template. While the template does allow for limiting the _amount of messages_ to send to the task model, it does __not__ allow limiting the size of the _chat messages themselves_. Therefore, a single very large chat would be sent straight to the task model, causing the delays. ### Desired Solution you'd like The current prompt templating for title generation allow the following options: * `MESSAGES`: all messages are inserted into the prompt * `MESSAGES:START:n`: the first `n` messages are inserted * `MESSAGES:END:n`: the last `n` messages are inserted * `MESSAGES:MIDDLETRUNCATE:n`: the first `n/2` and last `n/2` messages are inserted. I would propose to extend the prompt template with an optional `:MAXCHARS:n` suffix, limiting the size of each message to at most `n` characters. For example, `MESSAGES:START:5:MAXCHARS:500` inserts the first 5 messages, and limits all messages individually to at most 500 characters. The truncation should probably keep the beginning and end of messages, since prompts typically put intent and topic at those places. ### Alternatives Considered Using a smaller task model: when running locally this would require swapping the main model in and out of (V)RAM for follow up questions or would permanently occupy (V)RAM. Neither is ideal. For API-based users the cost would decrease, but token usage remains inefficient. ### Additional Context _No response_

GiteaMirror closed this issue

2026-04-25 09:14:24 -05:00

GiteaMirror commented

2026-04-25 09:14:25 -05:00

@daanknoope commented on GitHub (Feb 16, 2026):

Let me know if you'd like to see this implemented in Open WebUI - would be happy to write the changes!

@daanknoope commented on GitHub (Feb 16, 2026): Let me know if you'd like to see this implemented in Open WebUI - would be happy to write the changes!

GiteaMirror commented

2026-04-25 09:14:26 -05:00

@adhusch commented on GitHub (Feb 19, 2026):

IMHO a very good suggestion.

@adhusch commented on GitHub (Feb 19, 2026): IMHO a very good suggestion.

GiteaMirror commented

2026-04-25 09:14:26 -05:00

@tjbck commented on GitHub (Mar 8, 2026):

Addressed in dev.

@tjbck commented on GitHub (Mar 8, 2026): Addressed in dev.

GiteaMirror commented

2026-04-25 09:14:26 -05:00

@Classic298 commented on GitHub (Mar 8, 2026):

9d8f590fc5

@Classic298 commented on GitHub (Mar 8, 2026): https://github.com/open-webui/open-webui/commit/9d8f590fc516d8ba80e2eaa86f82a6a37f9e4b51

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#35030