mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-11 00:04:08 -05:00
feat: Allow parameter control for task model #6176
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Gjarllarhorn on GitHub (Aug 22, 2025).
Check Existing Issues
Problem Description
When using a large model (mode size > VRAM, it will spill into system memory) and selecting a separate task model, once chat completes, it will eject the large model to load the task model (as it doesn’t fit in memory), adding extra delays.
Desired Solution you'd like
Add advance parameter control for task model, the specific use case is to be able to keep the task model loaded indefinitely and run on CPU rather than GPU.
This would help with larger models that take up all the available VRAM. When larger models are loaded, all VRAM is used and the rest is loaded into system memory, if a separate task model is selected, it will eject the large model and then load the task model into the released vram. This introduces extra time as the larger model would have to be loaded again into memory.
By allowing control over the task model parameters, it can be tuned so it keeps a very small/efficient model loaded in system memory and run on CPU, for example setting the parameters:
• num_gpu: 0
• keep_alive:-1
Alternatives Considered
No response
Additional Context
No response