mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
[GH-ISSUE #21499] feat: support truncating chat messages for task models #35030
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @daanknoope on GitHub (Feb 16, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/21499
Check Existing Issues
Verify Feature Scope
Problem Description
The Open WebUI implementation for generating titles and tags is currently not optimised for conversations with very long chats (eg for when summarising documents or analysing code. The current implementation sends full chat messages to the task model. For a chat where the user pasted a large amount of text, this means the task model has to fully process that, only to generate a title/tag.
This has two concrete problems:
This behaviour is caused by the implementation of the task model prompt template. While the template does allow for limiting the amount of messages to send to the task model, it does not allow limiting the size of the chat messages themselves. Therefore, a single very large chat would be sent straight to the task model, causing the delays.
Desired Solution you'd like
The current prompt templating for title generation allow the following options:
MESSAGES: all messages are inserted into the promptMESSAGES:START:n: the firstnmessages are insertedMESSAGES:END:n: the lastnmessages are insertedMESSAGES:MIDDLETRUNCATE:n: the firstn/2and lastn/2messages are inserted.I would propose to extend the prompt template with an optional
:MAXCHARS:nsuffix, limiting the size of each message to at mostncharacters.For example,
MESSAGES:START:5:MAXCHARS:500inserts the first 5 messages, and limits all messages individually to at most 500 characters.The truncation should probably keep the beginning and end of messages, since prompts typically put intent and topic at those places.
Alternatives Considered
Using a smaller task model: when running locally this would require swapping the main model in and out of (V)RAM for follow up questions or would permanently occupy (V)RAM. Neither is ideal. For API-based users the cost would decrease, but token usage remains inefficient.
Additional Context
No response
@daanknoope commented on GitHub (Feb 16, 2026):
Let me know if you'd like to see this implemented in Open WebUI - would be happy to write the changes!
@adhusch commented on GitHub (Feb 19, 2026):
IMHO a very good suggestion.
@tjbck commented on GitHub (Mar 8, 2026):
Addressed in dev.
@Classic298 commented on GitHub (Mar 8, 2026):
9d8f590fc5