[GH-ISSUE #8467] Use Llama3.1-8B with ollama does not use VRAM, only uses the CPU #51959

Closed
opened 2026-04-28 21:22:22 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @WinstonCHEN1 on GitHub (Jan 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8467

What is the issue?

After using ollama to pull llama3.1, ollama serve prompts that the GPU is detected, and the log is as follows:

time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.6)"
time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu]
time=2025-01-16T20:58:45.361+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-01-16T20:58:45.451+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-01-16T20:58:45.451+08:00 level=INFO source=amd_linux.go:297 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"
time=2025-01-16T20:58:45.451+08:00 level=INFO source=amd_linux.go:404 msg="no compatible amdgpu devices detected"
time=2025-01-16T20:58:45.451+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-1c625a84-5d45-2274-8fa9-69f73d8ce165 library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="21.6 GiB"

But when I use llama3.1 for inference, my GPU VRAM does not change at all, but the CPU usage increases significantly.

It shows that I still have more than 21G of VRAM available, but llama3.1-8b does not need so much. ollama ps shows 100% GPU, but my reasoning is that the video memory is not called, and only the CPU is always occupied. This causes the speed to be very slow. How to solve it?

OS:Ubuntu 24.04.1 LTS
CPU:Ryzen 9 7950X3D
GPU:RTX 4090
CUDA:12.4
Mem:93G available

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.5.6

Originally created by @WinstonCHEN1 on GitHub (Jan 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8467 ### What is the issue? After using ollama to pull llama3.1, `ollama serve` prompts that the GPU is detected, and the log is as follows: ``` time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.6)" time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu] time=2025-01-16T20:58:45.361+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs" time=2025-01-16T20:58:45.451+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-01-16T20:58:45.451+08:00 level=INFO source=amd_linux.go:297 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB" time=2025-01-16T20:58:45.451+08:00 level=INFO source=amd_linux.go:404 msg="no compatible amdgpu devices detected" time=2025-01-16T20:58:45.451+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-1c625a84-5d45-2274-8fa9-69f73d8ce165 library=cuda variant=v12 compute=8.9 driver=12.4 name="NVIDIA GeForce RTX 4090" total="23.6 GiB" available="21.6 GiB" ``` But when I use llama3.1 for inference, my GPU VRAM does not change at all, but the CPU usage increases significantly. It shows that I still have more than 21G of VRAM available, but llama3.1-8b does not need so much. `ollama ps` shows 100% GPU, but my reasoning is that the video memory is not called, and only the CPU is always occupied. This causes the speed to be very slow. How to solve it? OS:Ubuntu 24.04.1 LTS CPU:Ryzen 9 7950X3D GPU:RTX 4090 CUDA:12.4 Mem:93G available ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.5.6
GiteaMirror added the bug label 2026-04-28 21:22:22 -05:00
Author
Owner

@WinstonCHEN1 commented on GitHub (Jan 17, 2025):

All parameters set to default.

<!-- gh-comment-id:2597508139 --> @WinstonCHEN1 commented on GitHub (Jan 17, 2025): All parameters set to default.
Author
Owner

@rick-github commented on GitHub (Jan 17, 2025):

time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu]

Your installation doesn't have any GPU enabled runners. How was it installed?

<!-- gh-comment-id:2597602438 --> @rick-github commented on GitHub (Jan 17, 2025): ``` time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu] ``` Your installation doesn't have any GPU enabled runners. How was it installed?
Author
Owner

@WinstonCHEN1 commented on GitHub (Jan 17, 2025):

time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu]

Your installation doesn't have any GPU enabled runners. How was it installed?

Are you referring to the installation Ollama or something else?

<!-- gh-comment-id:2597617911 --> @WinstonCHEN1 commented on GitHub (Jan 17, 2025): > ``` > time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu] > ``` > > Your installation doesn't have any GPU enabled runners. How was it installed? Are you referring to the installation Ollama or something else?
Author
Owner

@WinstonCHEN1 commented on GitHub (Jan 17, 2025):

time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu]

Your installation doesn't have any GPU enabled runners. How was it installed?

Due to network problems, I chose manual installation, that is, downloading the ollama-linux-amd64 file from GitHub releases, moving it to the system path, configuring it as a system service, and then ollama serve.

<!-- gh-comment-id:2597628656 --> @WinstonCHEN1 commented on GitHub (Jan 17, 2025): > ``` > time=2025-01-16T20:58:45.361+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu] > ``` > > Your installation doesn't have any GPU enabled runners. How was it installed? Due to network problems, I chose manual installation, that is, downloading the ollama-linux-amd64 file from GitHub releases, moving it to the system path, configuring it as a system service, and then `ollama serve`.
Author
Owner

@rick-github commented on GitHub (Jan 17, 2025):

ollama finds runner relative to the path to the server. By moving the ollama program to the system path, it's unable to find runners. A better solution is to move the program back to where is was, and add a symbolic link.

system=$(command -v ollama)
sudo mv $system /usr/local/bin/ollama
sudo ln -s /usr/local/bin/ollama $system
<!-- gh-comment-id:2597676328 --> @rick-github commented on GitHub (Jan 17, 2025): ollama finds runner relative to the path to the server. By moving the ollama program to the system path, it's unable to find runners. A better solution is to move the program back to where is was, and add a symbolic link. ``` system=$(command -v ollama) sudo mv $system /usr/local/bin/ollama sudo ln -s /usr/local/bin/ollama $system ```
Author
Owner

@WinstonCHEN1 commented on GitHub (Jan 17, 2025):

ollama finds runner relative to the path to the server. By moving the ollama program to the system path, it's unable to find runners. A better solution is to move the program back to where is was, and add a symbolic link.

system=$(command -v ollama)
sudo mv $system /usr/local/bin/ollama
sudo ln -s /usr/local/bin/ollama $system

I used this method and the problem has been solved, thank you for your help!

<!-- gh-comment-id:2597799217 --> @WinstonCHEN1 commented on GitHub (Jan 17, 2025): > ollama finds runner relative to the path to the server. By moving the ollama program to the system path, it's unable to find runners. A better solution is to move the program back to where is was, and add a symbolic link. > > ``` > system=$(command -v ollama) > sudo mv $system /usr/local/bin/ollama > sudo ln -s /usr/local/bin/ollama $system > ``` I used this method and the problem has been solved, thank you for your help!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51959