[GH-ISSUE #12116] recent ollama no longer loads models on GPU VRAM #70111

Closed
opened 2026-05-04 20:22:38 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @constantin-ungureanu-github on GitHub (Aug 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12116

What is the issue?

I made an update to latest versions and when running now models is no longer loaded on the GPUs.
This is a regression, it used to load on GPUs (version 0.11.0).

Not sure where the regression happened, probably v0.11.5 or 0.11.6. On 0.11.6 also reproduced.

I run 2 GPUs 5090 Nvidia FE, on Linux Fedora Core.
Reproduced with various models that should fit into VRAM (and used to fit up to recent updates).

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.11.8

Originally created by @constantin-ungureanu-github on GitHub (Aug 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12116 ### What is the issue? I made an update to latest versions and when running now models is no longer loaded on the GPUs. This is a regression, it used to load on GPUs (version 0.11.0). Not sure where the regression happened, probably v0.11.5 or 0.11.6. On 0.11.6 also reproduced. I run 2 GPUs 5090 Nvidia FE, on Linux Fedora Core. Reproduced with various models that should fit into VRAM (and used to fit up to recent updates). ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.11.8
GiteaMirror added the bug label 2026-05-04 20:22:38 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 29, 2025):

Server logs will help in debugging.

<!-- gh-comment-id:3236618802 --> @rick-github commented on GitHub (Aug 29, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in debugging.
Author
Owner

@constantin-ungureanu-github commented on GitHub (Aug 29, 2025):

nvidia.txt

ollama.log

<!-- gh-comment-id:3236658758 --> @constantin-ungureanu-github commented on GitHub (Aug 29, 2025): [nvidia.txt](https://github.com/user-attachments/files/22044105/nvidia.txt) [ollama.log](https://github.com/user-attachments/files/22044106/ollama.log)
Author
Owner

@rick-github commented on GitHub (Aug 29, 2025):

Aug 29 12:00:56 linux ollama[12550]: time=2025-08-29T12:00:56.249+02:00 level=INFO source=gpu.go:604 msg="no nvidia devices detected by library /usr/lib64/libcuda.so.575.64.05"

What's the output of lsmod?

<!-- gh-comment-id:3236683146 --> @rick-github commented on GitHub (Aug 29, 2025): ``` Aug 29 12:00:56 linux ollama[12550]: time=2025-08-29T12:00:56.249+02:00 level=INFO source=gpu.go:604 msg="no nvidia devices detected by library /usr/lib64/libcuda.so.575.64.05" ``` What's the output of `lsmod`?
Author
Owner

@constantin-ungureanu-github commented on GitHub (Aug 29, 2025):

lsmod.txt

<!-- gh-comment-id:3236766259 --> @constantin-ungureanu-github commented on GitHub (Aug 29, 2025): [lsmod.txt](https://github.com/user-attachments/files/22044684/lsmod.txt)
Author
Owner

@constantin-ungureanu-github commented on GitHub (Aug 29, 2025):

In ollama logs it shows:

Aug 29 12:19:00 linux ollama[12550]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) - 31085 MiB free
Aug 29 12:19:00 linux ollama[12550]: llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 5090) - 31085 MiB free
....
Aug 29 12:19:00 linux ollama[12550]: load_tensors: loading model tensors, this can take a while... (mmap = false)
Aug 29 12:19:05 linux ollama[12550]: load_tensors: offloading 0 repeating layers to GPU
Aug 29 12:19:05 linux ollama[12550]: load_tensors: offloaded 0/81 layers to GPU
Aug 29 12:19:05 linux ollama[12550]: load_tensors: CUDA_Host model buffer size = 39979.48 MiB
Aug 29 12:19:05 linux ollama[12550]: load_tensors: CPU model buffer size = 563.62 MiB

<!-- gh-comment-id:3236777725 --> @constantin-ungureanu-github commented on GitHub (Aug 29, 2025): In ollama logs it shows: Aug 29 12:19:00 linux ollama[12550]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) - 31085 MiB free Aug 29 12:19:00 linux ollama[12550]: llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 5090) - 31085 MiB free .... Aug 29 12:19:00 linux ollama[12550]: load_tensors: loading model tensors, this can take a while... (mmap = false) Aug 29 12:19:05 linux ollama[12550]: load_tensors: offloading 0 repeating layers to GPU Aug 29 12:19:05 linux ollama[12550]: load_tensors: offloaded 0/81 layers to GPU Aug 29 12:19:05 linux ollama[12550]: load_tensors: CUDA_Host model buffer size = 39979.48 MiB Aug 29 12:19:05 linux ollama[12550]: load_tensors: CPU model buffer size = 563.62 MiB
Author
Owner

@rick-github commented on GitHub (Aug 29, 2025):

sudo dmesg | grep -i nv
<!-- gh-comment-id:3236807064 --> @rick-github commented on GitHub (Aug 29, 2025): ``` sudo dmesg | grep -i nv ```
Author
Owner

@constantin-ungureanu-github commented on GitHub (Aug 29, 2025):

nv_dmesg.txt

<!-- gh-comment-id:3236827356 --> @constantin-ungureanu-github commented on GitHub (Aug 29, 2025): [nv_dmesg.txt](https://github.com/user-attachments/files/22044925/nv_dmesg.txt)
Author
Owner

@rick-github commented on GitHub (Aug 29, 2025):

[    6.261427] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  575.64.05  Release Build  (dvs-builder@U22-A23-13-1)  Fri Jul 18 15:48:34 UTC 2025

https://github.com/ollama/ollama/issues/11932#issuecomment-3194037633

<!-- gh-comment-id:3237512403 --> @rick-github commented on GitHub (Aug 29, 2025): ``` [ 6.261427] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 575.64.05 Release Build (dvs-builder@U22-A23-13-1) Fri Jul 18 15:48:34 UTC 2025 ``` https://github.com/ollama/ollama/issues/11932#issuecomment-3194037633
Author
Owner

@constantin-ungureanu-github commented on GitHub (Sep 1, 2025):

After some hassle, I managed to install the drivers from Nvidia repo for Fedora 42, following from guide described here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux

I can confirm that after this, running ollama does use the GPUs.

Why shouldn't also run with the Nvidia open drivers?
Your choice to close this or not, or to make any fixes.

<!-- gh-comment-id:3241435616 --> @constantin-ungureanu-github commented on GitHub (Sep 1, 2025): After some hassle, I managed to install the drivers from Nvidia repo for Fedora 42, following from guide described here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux I can confirm that after this, running ollama does use the GPUs. Why shouldn't also run with the Nvidia open drivers? Your choice to close this or not, or to make any fixes.
Author
Owner

@constantin-ungureanu-github commented on GitHub (Sep 10, 2025):

I made an update to the Nvidia drivers, and this issue was again hit.
The new Nvidia GPUs have to use the open driver, not the proprietary one, that should be used for old GPUs.

However, installing drivers from the Nvidia repo solved this. But had to use the open drivers.

https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/

<!-- gh-comment-id:3275105465 --> @constantin-ungureanu-github commented on GitHub (Sep 10, 2025): I made an update to the Nvidia drivers, and this issue was again hit. The new Nvidia GPUs have to use the open driver, not the proprietary one, that should be used for old GPUs. However, installing drivers from the Nvidia repo solved this. But had to use the open drivers. https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/
Author
Owner

@rick-github commented on GitHub (Sep 10, 2025):

Yes, it seems that the more recent open drivers work successfully: https://discord.com/channels/1128867683291627614/1412908989263380612

I'm going to close this as resolved but feel free to re-open if you have problems.

<!-- gh-comment-id:3275157365 --> @rick-github commented on GitHub (Sep 10, 2025): Yes, it seems that the more recent open drivers work successfully: https://discord.com/channels/1128867683291627614/1412908989263380612 I'm going to close this as resolved but feel free to re-open if you have problems.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70111