[GH-ISSUE #9766] Ollama 0.6.0 cannot use CUDA #6384

Closed
opened 2026-04-12 17:53:33 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @wwshs on GitHub (Mar 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9766

What is the issue?

Ollama 0.6.0 for Windows.
Windows 11 Workstation.
Ollama 0.6.0 cannot use CUDA.
It can not use GPU.
Everthing use CPU.
I'v gone back to Ollama 0.5.13.

Relevant log output


OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.6.0

Originally created by @wwshs on GitHub (Mar 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9766 ### What is the issue? Ollama 0.6.0 for Windows. Windows 11 Workstation. Ollama 0.6.0 cannot use CUDA. It can not use GPU. Everthing use CPU. I'v gone back to Ollama 0.5.13. ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.0
GiteaMirror added the bug label 2026-04-12 17:53:33 -05:00
Author
Owner

@jmorganca commented on GitHub (Mar 14, 2025):

@wwshs sorry to hear. Do you have the logs by chance?

<!-- gh-comment-id:2724906539 --> @jmorganca commented on GitHub (Mar 14, 2025): @wwshs sorry to hear. Do you have the [logs](https://github.com/user-attachments/files/18643815/server.log) by chance?
Author
Owner

@mehditahmasebi commented on GitHub (Mar 14, 2025):

me too

<!-- gh-comment-id:2724926556 --> @mehditahmasebi commented on GitHub (Mar 14, 2025): me too
Author
Owner

@Uncle-Enzo commented on GitHub (Mar 14, 2025):

I had the same issue in wsl with Gemma3:4b. But in fixing it through forcing Cuda usage I now see a lot of this:

ggml_cuda_host_malloc: failed to allocate 0.00 MiB of pinned memory: out of memory
ggml_cuda_host_malloc: failed to allocate 0.00 MiB of pinned memory: out of memory
ggml_cuda_host_malloc: failed to allocate 0.01 MiB of pinned memory: out of memory
ggml_cuda_host_malloc: failed to allocate 0.01 MiB of pinned memory: out of memory
ggml_cuda_host_malloc: failed to allocate 0.00 MiB of pinned memory: out of memory
ggml_cuda_host_malloc: failed to allocate 0.00 MiB of pinned memory: out of memory

Very odd.

<!-- gh-comment-id:2725123280 --> @Uncle-Enzo commented on GitHub (Mar 14, 2025): I had the same issue in wsl with Gemma3:4b. But in fixing it through forcing Cuda usage I now see a lot of this: ggml_cuda_host_malloc: failed to allocate 0.00 MiB of pinned memory: out of memory ggml_cuda_host_malloc: failed to allocate 0.00 MiB of pinned memory: out of memory ggml_cuda_host_malloc: failed to allocate 0.01 MiB of pinned memory: out of memory ggml_cuda_host_malloc: failed to allocate 0.01 MiB of pinned memory: out of memory ggml_cuda_host_malloc: failed to allocate 0.00 MiB of pinned memory: out of memory ggml_cuda_host_malloc: failed to allocate 0.00 MiB of pinned memory: out of memory Very odd.
Author
Owner

@Mohamed0Hegazi commented on GitHub (Mar 17, 2025):

https://code.superxiang.com/

<!-- gh-comment-id:2728852684 --> @Mohamed0Hegazi commented on GitHub (Mar 17, 2025): https://code.superxiang.com/
Author
Owner

@seanfarr788 commented on GitHub (Mar 17, 2025):

I had this issue where my install was a locally built version of ollama. I think it may have to do with a mixture of installation methods but uncertain. By running

sudo kill $(pidof ollama)

it seemed to fix it.

<!-- gh-comment-id:2730212144 --> @seanfarr788 commented on GitHub (Mar 17, 2025): I had this issue where my install was a locally built version of ollama. I think it may have to do with a mixture of installation methods but uncertain. By running `sudo kill $(pidof ollama)` it seemed to fix it.
Author
Owner

@molkemon commented on GitHub (Apr 3, 2025):

I am gettting the same thing. I am using a powershell script to feed several hundred text files to amodel to replace dynamic localisation with generic wording for a game mod (ie [Root.Monarch.GetName] should be replaced with "our ruler" or something like that. It works fine, but after a few hundred gens it starts spamming the couldn't allocate 0mb VRAM error.

I have 24gb and it happens with both gemma3:27b and gemma3:12b, even though with both i still have some VRAM left (a lot with the 12b model obviously).

I really don't see a reason why this would happen, I don't think I have enabled history or something (unless I have to specifically turn the api to turn it off? butt i can't find anything in documenation). Generation continues, but way slower. So I have to stop my script and then start it again. It will continue where it left off, but this is still very annoying.

<!-- gh-comment-id:2776460834 --> @molkemon commented on GitHub (Apr 3, 2025): I am gettting the same thing. I am using a powershell script to feed several hundred text files to amodel to replace dynamic localisation with generic wording for a game mod (ie [Root.Monarch.GetName] should be replaced with "our ruler" or something like that. It works fine, but after a few hundred gens it starts spamming the couldn't allocate 0mb VRAM error. I have 24gb and it happens with both gemma3:27b and gemma3:12b, even though with both i still have some VRAM left (a lot with the 12b model obviously). I really don't see a reason why this would happen, I don't think I have enabled history or something (unless I have to specifically turn the api to turn it off? butt i can't find anything in documenation). Generation continues, but way slower. So I have to stop my script and then start it again. It will continue where it left off, but this is still very annoying.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6384