[GH-ISSUE #7947] Not using GPU #67144

Closed
opened 2026-05-04 09:32:47 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @frenzybiscuit on GitHub (Dec 5, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7947

What is the issue?

I have the following setup:

7950x3d (AMD iGPU)
3090 + 2080ti

When using Ollama with open-webui the GPU (3090) gets used BRIEFLY. It starts using the GPU, GPU ramps up to 90% utilization and then it just stops and falls back to the CPU.

I have installed Ollama and built from source on Fedora 41. I have installed the cuda toolkit manually. I call the following environmental variables from bashrc:

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

I can build llamacpp from source and it works with CUDA.

I am launching Ollama with the following:

CUDA_VISIBLE_DEVICES=0 ROCR_VISIBLE_DEVICES=55 OLLAMA_HOST=127.0.0.1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q8_0 ollama-0.4.8-rc0/./ollama serve

CUDA_VISIBLE_DEVICES=0 should force it to just use the 3090. I am using ROCR_VISIBLE_DEVICES=55 (fake number) so it doesn't use the AMD iGPU and fall back to CPU.

Any idea why this setup isn't working?

Ollama keeps briefly showing up on nvidia-smi and then vanishing.

It does show the following error:

"WARN source=sched.go:646 GPU VRAM usage didn't recover within timeout.

OS

Linux

GPU

Other

CPU

AMD

Ollama version

0.4.8-rc0

Originally created by @frenzybiscuit on GitHub (Dec 5, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7947 ### What is the issue? I have the following setup: 7950x3d (AMD iGPU) 3090 + 2080ti When using Ollama with open-webui the GPU (3090) gets used BRIEFLY. It starts using the GPU, GPU ramps up to 90% utilization and then it just stops and falls back to the CPU. I have installed Ollama and built from source on Fedora 41. I have installed the cuda toolkit manually. I call the following environmental variables from bashrc: ``` export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH ``` I can build llamacpp from source and it works with CUDA. I am launching Ollama with the following: `CUDA_VISIBLE_DEVICES=0 ROCR_VISIBLE_DEVICES=55 OLLAMA_HOST=127.0.0.1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q8_0 ollama-0.4.8-rc0/./ollama serve` CUDA_VISIBLE_DEVICES=0 should force it to just use the 3090. I am using ROCR_VISIBLE_DEVICES=55 (fake number) so it doesn't use the AMD iGPU and fall back to CPU. Any idea why this setup isn't working? Ollama keeps briefly showing up on nvidia-smi and then vanishing. It does show the following error: `"WARN source=sched.go:646 GPU VRAM usage didn't recover within timeout.` ### OS Linux ### GPU Other ### CPU AMD ### Ollama version 0.4.8-rc0
GiteaMirror added the bug label 2026-05-04 09:32:47 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 5, 2024):

Add server logs.

<!-- gh-comment-id:2519696659 --> @rick-github commented on GitHub (Dec 5, 2024): Add server logs.
Author
Owner

@frenzybiscuit commented on GitHub (Dec 5, 2024):

ollama.txt

<!-- gh-comment-id:2519722225 --> @frenzybiscuit commented on GitHub (Dec 5, 2024): [ollama.txt](https://github.com/user-attachments/files/18020595/ollama.txt)
Author
Owner
<!-- gh-comment-id:2519765740 --> @frenzybiscuit commented on GitHub (Dec 5, 2024): https://github.com/user-attachments/assets/4d77b4e0-c669-448b-8fe2-7a0862d9f030
Author
Owner

@rick-github commented on GitHub (Dec 5, 2024):

time=2024-12-05T01:17:11.613-08:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2]"

The version you've built doesn't have any runners that use the GPU, only CPU runners.

<!-- gh-comment-id:2519938573 --> @rick-github commented on GitHub (Dec 5, 2024): ``` time=2024-12-05T01:17:11.613-08:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2]" ``` The version you've built doesn't have any runners that use the GPU, only CPU runners.
Author
Owner

@frenzybiscuit commented on GitHub (Dec 5, 2024):

Okay, what’s the proper way to build GPU runners? I followed the build guide.

<!-- gh-comment-id:2519969704 --> @frenzybiscuit commented on GitHub (Dec 5, 2024): Okay, what’s the proper way to build GPU runners? I followed the build guide.
Author
Owner

@rick-github commented on GitHub (Dec 5, 2024):

Which build guide? Fedora 41 appears to be not supported yet so building from source may not work yet. If you have docker installed, you could try the docker image.

<!-- gh-comment-id:2519999152 --> @rick-github commented on GitHub (Dec 5, 2024): Which build guide? Fedora 41 appears to be [not supported yet](https://github.com/ollama/ollama/issues/7869) so building from source may not work yet. If you have docker installed, you could try the docker image.
Author
Owner

@frenzybiscuit commented on GitHub (Dec 5, 2024):

What distros are supported?

thanks.

<!-- gh-comment-id:2520892969 --> @frenzybiscuit commented on GitHub (Dec 5, 2024): What distros are supported? thanks.
Author
Owner

@rick-github commented on GitHub (Dec 5, 2024):

Most releases that aren't bleeding edge should work. Fedora 41 was released October 29, 2024 so it will take a little work to make sure all the right dependencies are met, etc. Distros that want to build from source or otherwise have custom packages (eg Arch) may also have issues with the latest release of ollama. I've had no issues with installing ollama on Fedora 38, Ubuntu, Mint, Debian, Centos, and openSUSE. My preferred method these days is to use docker as it makes switching between versions easy.

<!-- gh-comment-id:2520924192 --> @rick-github commented on GitHub (Dec 5, 2024): Most releases that aren't bleeding edge should work. Fedora 41 was released October 29, 2024 so it will take a little work to make sure all the right dependencies are met, etc. Distros that want to build from source or otherwise have custom packages (eg Arch) may also have issues with the latest release of ollama. I've had no issues with installing ollama on Fedora 38, Ubuntu, Mint, Debian, Centos, and openSUSE. My preferred method these days is to use docker as it makes switching between versions easy.
Author
Owner

@frenzybiscuit commented on GitHub (Dec 5, 2024):

I was able to get this to work on fedora by using the install.sh method.

It seems to specifically be a problem with compiling from source.

<!-- gh-comment-id:2521135570 --> @frenzybiscuit commented on GitHub (Dec 5, 2024): I was able to get this to work on fedora by using the install.sh method. It seems to specifically be a problem with compiling from source.
Author
Owner

@dhiltgen commented on GitHub (Dec 10, 2024):

@frenzybiscuit PR #7499 may help make it a bit easier to build from source in your scenario.

<!-- gh-comment-id:2532370635 --> @dhiltgen commented on GitHub (Dec 10, 2024): @frenzybiscuit PR #7499 may help make it a bit easier to build from source in your scenario.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67144