[GH-ISSUE #1855] I miss option to specify num of gpu layers as model parameter #47571

Closed
opened 2026-04-28 04:15:32 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @JoseConseco on GitHub (Jan 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1855

The 2 most used parameters for gguf models are IMO: temp, and number of gpu layers for mode to use.
But number of gpu layers is 'baked' into ollama model template file.
This means we have to create new model, with new num of gpu layer - jut to change it.

yes I understand number of gpu layers is not something that can be changed after model was loaded. But still, creating new modelfile just to change gpu layer offloading parameter is overkill imo.

Originally created by @JoseConseco on GitHub (Jan 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1855 The 2 most used parameters for gguf models are IMO: temp, and number of gpu layers for mode to use. But number of gpu layers is 'baked' into ollama model template file. This means we have to create new model, with new num of gpu layer - jut to change it. yes I understand number of gpu layers is not something that can be changed after model was loaded. But still, creating new modelfile just to change gpu layer offloading parameter is overkill imo.
Author
Owner

@BruceMacD commented on GitHub (Jan 8, 2024):

Thanks for the feedback @JoseConseco, as of the last few versions of Ollama you can actually specify this in the interactive mode.

ollama run llama2
>>> /set parameter num_gpu 12
Set parameter 'num_gpu' to '12'

>>>

Does that help your use-case?

<!-- gh-comment-id:1881719430 --> @BruceMacD commented on GitHub (Jan 8, 2024): Thanks for the feedback @JoseConseco, as of the last few versions of Ollama you can actually specify this in the interactive mode. ``` ollama run llama2 >>> /set parameter num_gpu 12 Set parameter 'num_gpu' to '12' >>> ``` Does that help your use-case?
Author
Owner

@JoseConseco commented on GitHub (Jan 8, 2024):

awesome. It was one of the most annoying thing about ollama (having to create custom model, to change gpu layers. )
While /set parameter num_gpu 12 works - model is reloaded after next prompt, after setting gpu-layes.
Will have to test if this helps, if model is to big to load into vram. I suppose in that case ollama will just error out, and I wont be able to /set parameter num_gpu 12 right? In that case user will have to create new modelfile...
Is that similar option to set gpu-layer from the begining - like :
ollama run model.xyz -gpu-layer n ?

<!-- gh-comment-id:1881794494 --> @JoseConseco commented on GitHub (Jan 8, 2024): awesome. It was one of the most annoying thing about ollama (having to create custom model, to change gpu layers. ) While `/set parameter num_gpu 12` works - model is reloaded after next prompt, after setting gpu-layes. Will have to test if this helps, if model is to big to load into vram. I suppose in that case ollama will just error out, and I wont be able to `/set parameter num_gpu 12` right? In that case user will have to create new modelfile... Is that similar option to set gpu-layer from the begining - like : `ollama run model.xyz -gpu-layer n ` ?
Author
Owner

@BruceMacD commented on GitHub (Jan 9, 2024):

@JoseConseco setting it as a flag isn't an option right now, however this is a lot of work going on right now to load the optimal number of layers by default when a model is run.

<!-- gh-comment-id:1883427235 --> @BruceMacD commented on GitHub (Jan 9, 2024): @JoseConseco setting it as a flag isn't an option right now, however this is a lot of work going on right now to load the optimal number of layers by default when a model is run.
Author
Owner

@yangmingming commented on GitHub (Jun 20, 2024):

We can set the value of num_gpu through the following parameters. I didn't find how to get the current value? And what indicators can be referenced for setting this value?

/set parameter num_gpu
<!-- gh-comment-id:2180272988 --> @yangmingming commented on GitHub (Jun 20, 2024): We can set the value of num_gpu through the following parameters. I didn't find how to get the current value? And what indicators can be referenced for setting this value? ``` /set parameter num_gpu ```
Author
Owner

@dhiltgen commented on GitHub (Jul 24, 2024):

Ultimately the goal is users shouldn't have to adjust the num_gpu setting, and Ollama should load the optimal number of layers given the available VRAM. If there are bugs that cause us to load an incorrect number of layers, this setting can be used to workaround those bugs until we get them fixed. The setting can be specified via CLI, API, or within the Modelfile. I'm also planning to add another mechanism to workaround memory prediction bugs in #5922

<!-- gh-comment-id:2248948890 --> @dhiltgen commented on GitHub (Jul 24, 2024): Ultimately the goal is users shouldn't have to adjust the num_gpu setting, and Ollama should load the optimal number of layers given the available VRAM. If there are bugs that cause us to load an incorrect number of layers, this setting can be used to workaround those bugs until we get them fixed. The setting can be specified via CLI, API, or within the Modelfile. I'm also planning to add another mechanism to workaround memory prediction bugs in #5922
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47571