[GH-ISSUE #7952] Problems (with nvidia-smi) after upgrading to 0.4.7 (from 0.3 series) #5091

Closed
opened 2026-04-12 16:11:19 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @stronk7 on GitHub (Dec 5, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7952

What is the issue?

Hi,

while testing the new 0.4.7 series, everything seems to be working ok (Mac), but I've detected a problem when running on Ubuntu 24.04, with docker.

And, more specifically, the problem is with nvidia-smi because, unless I'm wrong, the GPU is being used normally and not the CPU.

With 0.3.13, I get this (all correct):

ollama0 3 13

But once I switch to 0.4.7, I get this:

ollama0 4 7

Note that, while the "total" memory usage displays ok (more or less similar to the 0.3.x one), the memory in the processes is not shown at all, although the processes themselves are.

I've looked to logs and everything looks ok there. And the only changing piece is the ollama docker version.

I was planning to go playing with the new Flash attention and K/V improvements and that piece of information is vital to be able to compare with different models and context sizes, so it would be great to get it back (if it's something somehow related with Ollama).

So, that's the reason for reporting it. Thanks for all the hard work, you rock!

Ciao :-)

OS

Docker

GPU

Nvidia

CPU

AMD

Ollama version

0.4.7

Originally created by @stronk7 on GitHub (Dec 5, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7952 ### What is the issue? Hi, while testing the new 0.4.7 series, everything seems to be working ok (Mac), but I've detected a problem when running on Ubuntu 24.04, with docker. And, more specifically, the problem is with `nvidia-smi` because, unless I'm wrong, the GPU is being used normally and not the CPU. With 0.3.13, I get this (all correct): ![ollama0 3 13](https://github.com/user-attachments/assets/7ed47eea-cce0-434b-8e67-781411a2e371) But once I switch to 0.4.7, I get this: ![ollama0 4 7](https://github.com/user-attachments/assets/84dbc896-be72-47d3-8271-988ae7ac20ee) Note that, while the "total" memory usage displays ok (more or less similar to the 0.3.x one), the memory in the processes is not shown at all, although the processes themselves are. I've looked to logs and everything looks ok there. And the only changing piece is the ollama docker version. I was planning to go playing with the new Flash attention and K/V improvements and that piece of information is vital to be able to compare with different models and context sizes, so it would be great to get it back (if it's something somehow related with Ollama). So, that's the reason for reporting it. Thanks for all the hard work, you rock! Ciao :-) ### OS Docker ### GPU Nvidia ### CPU AMD ### Ollama version 0.4.7
GiteaMirror added the bug label 2026-04-12 16:11:19 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 5, 2024):

I've noticed something similar when I have GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 set: the amount of memory reported per runner is much lower (not 0 as in your case). Works fine tho.

<!-- gh-comment-id:2521137164 --> @rick-github commented on GitHub (Dec 5, 2024): I've noticed something similar when I have `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` set: the amount of memory reported per runner is much lower (not 0 as in your case). Works fine tho.
Author
Owner

@stronk7 commented on GitHub (Jan 16, 2025):

For the records, I've just updated to 0.5.6 and it seems that the problems reported here with 0.4.x are gone.

So, maybe, this can be closed now. Proceeding.

Ciao :-)

<!-- gh-comment-id:2596134572 --> @stronk7 commented on GitHub (Jan 16, 2025): For the records, I've just updated to 0.5.6 and it seems that the problems reported here with 0.4.x are gone. So, maybe, this can be closed now. Proceeding. Ciao :-)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5091