[GH-ISSUE #4359] Mistral is not using GPU, but LLama3 is utilizing GPU properly #2721

Closed
opened 2026-04-12 13:02:13 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @itinance on GitHub (May 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4359

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

When running mistral:latest or stablelm2:latest, ollama is not utlizing the GPU on Ubuntu with NVIDIA graphiucs card.
Running Ollama:70b is using GPU very well.

Command nvidia-smi on ollama run mistral:latest:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 4000 SFF Ada ...    Off | 00000000:01:00.0 Off |                  Off |
| 30%   43C    P8              12W /  70W |      4MiB / 20475MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Command nvidia-smi on ollama run llama3:70b:


+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 4000 SFF Ada ...    Off | 00000000:01:00.0 Off |                  Off |
| 30%   45C    P2              33W /  70W |  19492MiB / 20475MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    254717      C   ...unners/cuda_v11/ollama_llama_server    19486MiB |
+---------------------------------------------------------------------------------------+

Also, running htop shows 100% CPU and processes that the CPU-server has been executed, consuming 100% on every CPU core, when running mistral. The same happens with stablelm2.
With llama3, the CPU cores are almost sleeping and GPU cores can con sume until 15/30 % load

Version:

ollama version is 0.1.32

GPU: NVIDIA RTX 4000

Ubuntu 22

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.32

Originally created by @itinance on GitHub (May 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4359 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? When running `mistral:latest` or `stablelm2:latest`, ollama is not utlizing the GPU on Ubuntu with NVIDIA graphiucs card. Running Ollama:70b is using GPU very well. Command **nvidia-smi** on `ollama run mistral:latest`: ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA RTX 4000 SFF Ada ... Off | 00000000:01:00.0 Off | Off | | 30% 43C P8 12W / 70W | 4MiB / 20475MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ ``` Command **nvidia-smi** on `ollama run llama3:70b`: ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA RTX 4000 SFF Ada ... Off | 00000000:01:00.0 Off | Off | | 30% 45C P2 33W / 70W | 19492MiB / 20475MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 254717 C ...unners/cuda_v11/ollama_llama_server 19486MiB | +---------------------------------------------------------------------------------------+ ``` Also, running `htop` shows 100% CPU and processes that the CPU-server has been executed, consuming 100% on every CPU core, when running mistral. The same happens with *stablelm2*. With llama3, the CPU cores are almost sleeping and GPU cores can con sume until 15/30 % load Version: ollama version is 0.1.32 GPU: NVIDIA RTX 4000 Ubuntu 22 ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.32
GiteaMirror added the bugnvidia labels 2026-04-12 13:02:13 -05:00
Author
Owner

@TheEpic-dev commented on GitHub (May 15, 2024):

Check the actual logs, not just resource monitors.
https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

<!-- gh-comment-id:2111425461 --> @TheEpic-dev commented on GitHub (May 15, 2024): Check the actual logs, not just resource monitors. https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md
Author
Owner

@dhiltgen commented on GitHub (May 21, 2024):

@itinance please upgrade to the latest version. If that doesn't resolve your problem, please share your server log so we can see why it didn't load on the GPU.

<!-- gh-comment-id:2123576276 --> @dhiltgen commented on GitHub (May 21, 2024): @itinance please upgrade to the latest version. If that doesn't resolve your problem, please share your server log so we can see why it didn't load on the GPU.
Author
Owner

@abh3po commented on GitHub (Aug 22, 2024):

Hello, I'm facing this issue as well, downloaded the latest ollama from the website, llama3.1 works well, mistral models don't work on gpu and are very slow.

Also only have this issue on Linux, works well on Windows

<!-- gh-comment-id:2304166800 --> @abh3po commented on GitHub (Aug 22, 2024): Hello, I'm facing this issue as well, downloaded the latest ollama from the website, llama3.1 works well, mistral models don't work on gpu and are very slow. Also only have this issue on Linux, works well on Windows
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2721