[GH-ISSUE #5397] V0.1.48 The model is loaded into the GPU Memory but runs on the CPU #3377

Closed
opened 2026-04-12 14:00:30 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @wxtt522 on GitHub (Jul 1, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5397

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

ollama run gemma2:27b
image
The same goes for loading other models. It was normal in the previous version. I did not change any environment variables.

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.48

Originally created by @wxtt522 on GitHub (Jul 1, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5397 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? ollama run gemma2:27b ![image](https://github.com/ollama/ollama/assets/28422636/313df4c1-141f-4c61-aeeb-3525cc6fd975) The same goes for loading other models. It was normal in the previous version. I did not change any environment variables. ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.48
GiteaMirror added the bugneeds more infowindows labels 2026-04-12 14:00:30 -05:00
Author
Owner

@issaccv commented on GitHub (Jul 2, 2024):

I also encountered the same problem. According to my investigation, the problem is caused by OLLAMA_NUM_PARALLEL. When the value of this flag is set to a large value (such as 16 or 32), ollama will allocate more memory than VRAM, which causes some layers not to be offloaded to the GPU.

<!-- gh-comment-id:2201970011 --> @issaccv commented on GitHub (Jul 2, 2024): I also encountered the same problem. According to my investigation, the problem is caused by OLLAMA_NUM_PARALLEL. When the value of this flag is set to a large value (such as 16 or 32), ollama will allocate more memory than VRAM, which causes some layers not to be offloaded to the GPU.
Author
Owner

@dhiltgen commented on GitHub (Jul 2, 2024):

@wxtt522 can you share your server log so I can see what's going on? Can you also share what ollama ps says?

<!-- gh-comment-id:2204363209 --> @dhiltgen commented on GitHub (Jul 2, 2024): @wxtt522 can you share your server log so I can see what's going on? Can you also share what `ollama ps` says?
Author
Owner

@wxtt522 commented on GitHub (Jul 3, 2024):

@wxtt522 can you share your server log so I can see what's going on? Can you also share what ollama ps says?

server-1.log

Of course, this is my server log, I have adjusted the parameters mentioned above OLLAMA_NUM_PARALLEL, but it has no effect.
I noticed memory.available in the log,This is not normal. I found that the model started to be loaded into the memory when the gpu:p40 was loaded into the 4g gpu memory,But I didn't find any other way to adjust it

<!-- gh-comment-id:2205009363 --> @wxtt522 commented on GitHub (Jul 3, 2024): > @wxtt522 can you share your server log so I can see what's going on? Can you also share what `ollama ps` says? [server-1.log](https://github.com/user-attachments/files/16076630/server-1.log) Of course, this is my server log, I have adjusted the parameters mentioned above OLLAMA_NUM_PARALLEL, but it has no effect. I noticed memory.available in the log,This is not normal. I found that the model started to be loaded into the memory when the gpu:p40 was loaded into the 4g gpu memory,But I didn't find any other way to adjust it
Author
Owner

@wxtt522 commented on GitHub (Jul 3, 2024):

(base) PS C:\Users\admin> ollama ps
NAME ID SIZE PROCESSOR UNTIL
deepseek-coder-v2:16b-lite-instruct-q8_0 44250301ba51 18 GB 83%/17% CPU/GPU Forever

<!-- gh-comment-id:2205059495 --> @wxtt522 commented on GitHub (Jul 3, 2024): (base) PS C:\Users\admin> ollama ps NAME ID SIZE PROCESSOR UNTIL deepseek-coder-v2:16b-lite-instruct-q8_0 44250301ba51 18 GB 83%/17% CPU/GPU Forever
Author
Owner

@wxtt522 commented on GitHub (Jul 3, 2024):

Although I don’t know the reason, I solved the problem by resetting the graphics card driver configuration

<!-- gh-comment-id:2205283354 --> @wxtt522 commented on GitHub (Jul 3, 2024): Although I don’t know the reason, I solved the problem by resetting the graphics card driver configuration
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3377