mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #638] Separate ollama instance for titles #50822
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @robertvazan on GitHub (Feb 3, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/638
Is your feature request related to a problem? Please describe.
Auto-generated titles cause serious performance problems. They trash KV cache in llama.cpp, so the whole prompt and the first response have to be processed again after followup question, which is slow on CPU. Titles also take time to generate, sometimes a lot of time if the model gets stuck in a loop. If separate model is used for titles, time is wasted loading it and then reloading the main model.
Describe the solution you'd like
Make it possible to configure second ollama instance dedicated to titles. Query this second instance concurrently, so that user can continue chatting.
Things get a bit messy with models, especially custom ones. A lazy way to do this is to offer only models available in the second ollama instance. A more complete solution is to add a new settings tab for title generation, which would allow model download and customization (prompt, max tokens).
Describe alternatives you've considered
Llama.cpp supports multiple concurrent sessions (or "slots"), which would fix the KV cache trashing, but ollama does not support it yet. Neither llama.cpp nor ollama have support for loading and running multiple models concurrently.
@tjbck commented on GitHub (Feb 3, 2024):
Duplicate #278