[GH-ISSUE #2877] feat: allow setting num_gpu parameter #51715

Closed
opened 2026-05-05 12:52:58 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @mherrmann3 on GitHub (Jun 6, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/2877

Is your feature request related to a problem? Please describe.
To avoid creating a new modelfile only for adjusting/finetuning the number of layers offloaded to the GPU, make this setting ( num_gpu) user-configurable, which ollama considers as one of the most common parameters.

Describe the solution you'd like
Implement and add 'num_gpu (Ollama)' in the 'Advanced Params' section of a model in Workspace > Models.
(I would not add it to the 'Advanced Parameters' section of Settings > General, as the number and size of layers is model- and-quant-specific.

Describe alternatives you've considered
Well, creating a new ollama model(file) with an adjusted num_gpu, but this is cumbersome if one wants to adjust/finetune num_gpu (or modify it quickly if the GPU runs other things/models).

Additional context
num_gpu is not specified1 in the official ollama docs as valid PARAMETER, but is supported by the API.


  1. like use_mmap, use_mlock, and num_thread, which are already configurable in open_webui. ↩︎

Originally created by @mherrmann3 on GitHub (Jun 6, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/2877 **Is your feature request related to a problem? Please describe.** To avoid creating a new modelfile only for adjusting/finetuning the number of layers offloaded to the GPU, make this setting ( `num_gpu`) user-configurable, which _ollama_ considers as one of the [most common parameters](https://github.com/ollama/ollama/blob/de5beb06b314eb4950c5a0de8183dfadb325fc8b/cmd/interactive.go#L161). **Describe the solution you'd like** Implement and add _'num_gpu (Ollama)'_ in the _'Advanced Params'_ section of a model in _Workspace > Models_. (I would not add it to the _'Advanced Parameters'_ section of _Settings > General_, as the number and size of layers is model- and-quant-specific. **Describe alternatives you've considered** Well, creating a new ollama model(file) with an adjusted `num_gpu`, but this is cumbersome if one wants to adjust/finetune `num_gpu` (or modify it quickly if the GPU runs other things/models). **Additional context** `num_gpu` is not specified[^1] in the [official ollama docs as valid PARAMETER](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter), but is [supported by the API](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-request-with-options). [^1]: like `use_mmap`, `use_mlock`, and `num_thread`, which are already configurable in _open_webui_.
Author
Owner

@Qualzz commented on GitHub (Jul 4, 2024):

bumping this

<!-- gh-comment-id:2208568685 --> @Qualzz commented on GitHub (Jul 4, 2024): bumping this
Author
Owner

@derpyhue commented on GitHub (Jul 4, 2024):

fff91f7f43
By editing these files it will enable the use of changing num_gpu layers.
Might need a bit of polishing.
This is my first time committing something to GitHub but i hope it helps!

<!-- gh-comment-id:2208734106 --> @derpyhue commented on GitHub (Jul 4, 2024): https://github.com/derpyhue/openwebui_num_gpu/commit/fff91f7f436df157525677ce47b1c61ba1c3fbba By editing these files it will enable the use of changing num_gpu layers. Might need a bit of polishing. This is my first time committing something to GitHub but i hope it helps!
Author
Owner

@JKratto commented on GitHub (Jul 10, 2024):

+1
Thank you for this. I am looking forward to the merge. Ollama changed the memory allocation strategy (or so I think), and suddenly, the whole model "does not fit VRAM" (only 30/33 layers are offloaded to GPU). But in reality, it can fit 33/33 while still having about 25 % free VRAM. The performance penalty for the 30/33 scenario is about - 70 % loss of throughput (Mixtral 8x7b). Adjusting this setting for my machine would go a long way, as I would not have to create my own model to overcome this issue. It's not a big problem; it just seems cleaner. :)

<!-- gh-comment-id:2219900903 --> @JKratto commented on GitHub (Jul 10, 2024): +1 Thank you for this. I am looking forward to the merge. Ollama changed the memory allocation strategy (or so I think), and suddenly, the whole model "does not fit VRAM" (only 30/33 layers are offloaded to GPU). But in reality, it can fit 33/33 while still having about 25 % free VRAM. The performance penalty for the 30/33 scenario is about - 70 % loss of throughput (Mixtral 8x7b). Adjusting this setting for my machine would go a long way, as I would not have to create my own model to overcome this issue. It's not a big problem; it just seems cleaner. :)
Author
Owner

@mherrmann3 commented on GitHub (Jul 10, 2024):

FYI @JKratto @Qualzz: if you want to "+1" or "bump" this issue, it may be more effective to simply add a 👍 to the original message, as this may be the devs' preferred way to spot highly wanted features (that is, sorting issues by the number of 👍) 😉

<!-- gh-comment-id:2221606993 --> @mherrmann3 commented on GitHub (Jul 10, 2024): FYI @JKratto @Qualzz: if you want to "+1" or "bump" this issue, it may be more effective to simply add a 👍 to the original message, as this may be the devs' preferred way to spot highly wanted features (that is, [sorting issues by the number of 👍](https://github.com/open-webui/open-webui/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc)) 😉
Author
Owner

@silentoplayz commented on GitHub (Jul 11, 2024):

derpyhue/openwebui_num_gpu@fff91f7 By editing these files it will enable the use of changing num_gpu layers. Might need a bit of polishing. This is my first time committing something to GitHub but i hope it helps!

I'm personally not opposed with someone adding this in as a feature into Open WebUI, but we need to carefully consider the implications of exposing num_gpu along with num_ctx (already available) as user-editable options. When used in combination, changing these values can be disruptive since it requires a model reload in llama.cpp. As such, we may want to consider implementing controls to prevent users from modifying these settings unnecessarily. With that being said, if you're willing to submit a PR following our contributor guidelines, I think it's worth the consideration and effort that may be involved. Thanks for sharing the link to your forked commit. Perhaps the maintainer of the repo can take a closer look and discuss the implications further!

<!-- gh-comment-id:2222228585 --> @silentoplayz commented on GitHub (Jul 11, 2024): > [derpyhue/openwebui_num_gpu@fff91f7](https://github.com/derpyhue/openwebui_num_gpu/commit/fff91f7f436df157525677ce47b1c61ba1c3fbba) By editing these files it will enable the use of changing num_gpu layers. Might need a bit of polishing. This is my first time committing something to GitHub but i hope it helps! I'm personally not opposed with someone adding this in as a feature into Open WebUI, but we need to carefully consider the implications of exposing `num_gpu` along with `num_ctx` (already available) as user-editable options. When used in combination, changing these values can be disruptive since it requires a model reload in `llama.cpp`. As such, we may want to consider implementing controls to prevent users from modifying these settings unnecessarily. With that being said, if you're willing to submit a PR following our contributor guidelines, I think it's worth the consideration and effort that may be involved. Thanks for sharing the link to your forked commit. Perhaps the maintainer of the repo can take a closer look and discuss the implications further!
Author
Owner

@JKratto commented on GitHub (Jul 11, 2024):

@silentoplayz I would suggest leaving those options in the advanced/expert user area, hidden by default. Adding an info "Changing these settings can drastically impact the stability of your setup. Do not change, unless you know what you are doing." should do the trick?

@mherrmann3 sure, added the +1. :) Wanted to add more information to this issue for other folks maybe researching the same issue. :)

<!-- gh-comment-id:2222773217 --> @JKratto commented on GitHub (Jul 11, 2024): @silentoplayz I would suggest leaving those options in the advanced/expert user area, hidden by default. Adding an info "Changing these settings can drastically impact the stability of your setup. Do not change, unless you know what you are doing." should do the trick? @mherrmann3 sure, added the +1. :) Wanted to add more information to this issue for other folks maybe researching the same issue. :)
Author
Owner

@malallama commented on GitHub (Jul 12, 2024):

bumping this up 💪🏻

<!-- gh-comment-id:2225436383 --> @malallama commented on GitHub (Jul 12, 2024): bumping this up 💪🏻
Author
Owner

@5E-324 commented on GitHub (Aug 17, 2024):

I implemented this in pull request #4554, which is already merged. This issue can be closed now.

<!-- gh-comment-id:2295013925 --> @5E-324 commented on GitHub (Aug 17, 2024): I implemented this in pull request #4554, which is already merged. This issue can be closed now.
Author
Owner

@silentoplayz commented on GitHub (Aug 17, 2024):

I implemented this in pull request #4554, which is already merged. This issue can be closed now.

Thanks for the heads up. I'll close this issue now.

<!-- gh-comment-id:2295014755 --> @silentoplayz commented on GitHub (Aug 17, 2024): > I implemented this in pull request #4554, which is already merged. This issue can be closed now. Thanks for the heads up. I'll close this issue now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#51715