[GH-ISSUE #4783] How do I customize the number of layers to be loaded on GPU? #3012

Closed
opened 2026-04-12 13:25:02 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @lingyezhixing on GitHub (Jun 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4783

Originally assigned to: @dhiltgen on GitHub.

I tried adding num_gpu parameters to the modelfile, but it didn't seem to work.
My graphics memory still has 1.4GB left, but I can't make use of it.
What makes me even more annoyed is that only one layer is placed on the CPU, but this greatly reduces the inference speed.

Originally created by @lingyezhixing on GitHub (Jun 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4783 Originally assigned to: @dhiltgen on GitHub. I tried adding num_gpu parameters to the modelfile, but it didn't seem to work. My graphics memory still has 1.4GB left, but I can't make use of it. What makes me even more annoyed is that only one layer is placed on the CPU, but this greatly reduces the inference speed.
GiteaMirror added the questionneeds more info labels 2026-04-12 13:25:02 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jun 18, 2024):

Can you try setting num_gpu in the API request?

curl http://localhost:11434/api/generate -d '{
  "model": "myCustomModel",
  "prompt": "Why is the sky blue?",
  "stream": false, "options": {"num_gpu": 5}
}'

Then check ollama ps and the server log to see if it worked. If the system isn't behaving properly, please share the logs and the repro scenario so we can try to understand what's going wrong. If that does work, but the modelfile setting isn't working, please share a minimal modelfile that repro's the problem.

<!-- gh-comment-id:2177021629 --> @dhiltgen commented on GitHub (Jun 18, 2024): Can you try setting num_gpu in the API request? ``` curl http://localhost:11434/api/generate -d '{ "model": "myCustomModel", "prompt": "Why is the sky blue?", "stream": false, "options": {"num_gpu": 5} }' ``` Then check `ollama ps` and the server log to see if it worked. If the system isn't behaving properly, please share the logs and the repro scenario so we can try to understand what's going wrong. If that does work, but the modelfile setting isn't working, please share a minimal modelfile that repro's the problem.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3012