mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #2877] feat: allow setting num_gpu parameter
#51715
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mherrmann3 on GitHub (Jun 6, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/2877
Is your feature request related to a problem? Please describe.
To avoid creating a new modelfile only for adjusting/finetuning the number of layers offloaded to the GPU, make this setting (
num_gpu) user-configurable, which ollama considers as one of the most common parameters.Describe the solution you'd like
Implement and add 'num_gpu (Ollama)' in the 'Advanced Params' section of a model in Workspace > Models.
(I would not add it to the 'Advanced Parameters' section of Settings > General, as the number and size of layers is model- and-quant-specific.
Describe alternatives you've considered
Well, creating a new ollama model(file) with an adjusted
num_gpu, but this is cumbersome if one wants to adjust/finetunenum_gpu(or modify it quickly if the GPU runs other things/models).Additional context
num_gpuis not specified1 in the official ollama docs as valid PARAMETER, but is supported by the API.like
use_mmap,use_mlock, andnum_thread, which are already configurable in open_webui. ↩︎@Qualzz commented on GitHub (Jul 4, 2024):
bumping this
@derpyhue commented on GitHub (Jul 4, 2024):
fff91f7f43By editing these files it will enable the use of changing num_gpu layers.
Might need a bit of polishing.
This is my first time committing something to GitHub but i hope it helps!
@JKratto commented on GitHub (Jul 10, 2024):
+1
Thank you for this. I am looking forward to the merge. Ollama changed the memory allocation strategy (or so I think), and suddenly, the whole model "does not fit VRAM" (only 30/33 layers are offloaded to GPU). But in reality, it can fit 33/33 while still having about 25 % free VRAM. The performance penalty for the 30/33 scenario is about - 70 % loss of throughput (Mixtral 8x7b). Adjusting this setting for my machine would go a long way, as I would not have to create my own model to overcome this issue. It's not a big problem; it just seems cleaner. :)
@mherrmann3 commented on GitHub (Jul 10, 2024):
FYI @JKratto @Qualzz: if you want to "+1" or "bump" this issue, it may be more effective to simply add a 👍 to the original message, as this may be the devs' preferred way to spot highly wanted features (that is, sorting issues by the number of 👍) 😉
@silentoplayz commented on GitHub (Jul 11, 2024):
I'm personally not opposed with someone adding this in as a feature into Open WebUI, but we need to carefully consider the implications of exposing
num_gpualong withnum_ctx(already available) as user-editable options. When used in combination, changing these values can be disruptive since it requires a model reload inllama.cpp. As such, we may want to consider implementing controls to prevent users from modifying these settings unnecessarily. With that being said, if you're willing to submit a PR following our contributor guidelines, I think it's worth the consideration and effort that may be involved. Thanks for sharing the link to your forked commit. Perhaps the maintainer of the repo can take a closer look and discuss the implications further!@JKratto commented on GitHub (Jul 11, 2024):
@silentoplayz I would suggest leaving those options in the advanced/expert user area, hidden by default. Adding an info "Changing these settings can drastically impact the stability of your setup. Do not change, unless you know what you are doing." should do the trick?
@mherrmann3 sure, added the +1. :) Wanted to add more information to this issue for other folks maybe researching the same issue. :)
@malallama commented on GitHub (Jul 12, 2024):
bumping this up 💪🏻
@5E-324 commented on GitHub (Aug 17, 2024):
I implemented this in pull request #4554, which is already merged. This issue can be closed now.
@silentoplayz commented on GitHub (Aug 17, 2024):
Thanks for the heads up. I'll close this issue now.