[GH-ISSUE #8485] [0.5.7] small models are loaded to GPU, but inference is slow and using a lot of CPU #67521

Closed
opened 2026-05-04 10:39:29 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @kha84 on GitHub (Jan 19, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8485

What is the issue?

Hello there.

Just upgraded from ollama 0.4.x version to the latest one (0.5.7) and immediately noticed that inference of all models (even small ones, like llama 3.2 3B) become very slow. Like orders of magnitude slow. I can see that during inference CPU is being used intensively, even though the model itself is loaded into VRAM of GPU and there's a lot of VRAM is still free (as per nvtop).

OS Ubuntu 22 LTS
RTX 4090

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.5.7

Originally created by @kha84 on GitHub (Jan 19, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8485 ### What is the issue? Hello there. Just upgraded from ollama 0.4.x version to the latest one (0.5.7) and immediately noticed that inference of all models (even small ones, like llama 3.2 3B) become very slow. Like orders of magnitude slow. I can see that during inference CPU is being used intensively, even though the model itself is loaded into VRAM of GPU and there's a lot of VRAM is still free (as per nvtop). OS Ubuntu 22 LTS RTX 4090 ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-05-04 10:39:29 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 19, 2025):

Sever logs will aid in debugging.

<!-- gh-comment-id:2600896104 --> @rick-github commented on GitHub (Jan 19, 2025): [Sever logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.
Author
Owner

@kha84 commented on GitHub (Jan 19, 2025):

Sure, 5 minutes

<!-- gh-comment-id:2600897962 --> @kha84 commented on GitHub (Jan 19, 2025): Sure, 5 minutes
Author
Owner

@kha84 commented on GitHub (Jan 19, 2025):

So right before upgrade it was putting to logs this one:

Jan 19 13:43:30 xxxxxxxxxxxxxxxxxxx ollama[617005]: time=2025-01-19T13:43:30.339Z level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm>

... but after upgrade it degraded to cpu:

Jan 19 14:00:07 dwhxxxxxxxxxxxxxxxxxxx  ollama[623253]: time=2025-01-19T14:00:07.148Z level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu]

The upgrade was done with using official instruction - https://github.com/ollama/ollama/blob/main/docs/faq.md -

curl -fsSL https://ollama.com/install.sh | sh
<!-- gh-comment-id:2600900069 --> @kha84 commented on GitHub (Jan 19, 2025): So right before upgrade it was putting to logs this one: ``` Jan 19 13:43:30 xxxxxxxxxxxxxxxxxxx ollama[617005]: time=2025-01-19T13:43:30.339Z level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm> ``` ... but after upgrade it degraded to cpu: ``` Jan 19 14:00:07 dwhxxxxxxxxxxxxxxxxxxx ollama[623253]: time=2025-01-19T14:00:07.148Z level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners=[cpu] ``` The upgrade was done with using official instruction - https://github.com/ollama/ollama/blob/main/docs/faq.md - ``` curl -fsSL https://ollama.com/install.sh | sh ```
Author
Owner

@kha84 commented on GitHub (Jan 19, 2025):

I can see there're few similar issues been raised by different ppl:
https://github.com/ollama/ollama/issues/8467
https://github.com/ollama/ollama/issues/8474

<!-- gh-comment-id:2600900910 --> @kha84 commented on GitHub (Jan 19, 2025): I can see there're few similar issues been raised by different ppl: https://github.com/ollama/ollama/issues/8467 https://github.com/ollama/ollama/issues/8474
Author
Owner

@kha84 commented on GitHub (Jan 19, 2025):

Was thinking to give it a try with the different version, but just give 0.5.7 another try - installed the same thing again (literally just ran the same curl install) and it automagically fixed the issue. Kinda weird

<!-- gh-comment-id:2600926157 --> @kha84 commented on GitHub (Jan 19, 2025): Was thinking to give it a try with the different version, but just give 0.5.7 another try - installed the same thing again (literally just ran the same curl install) and it automagically fixed the issue. Kinda weird
Author
Owner

@818000 commented on GitHub (Jan 19, 2025):

Thank you. Your email is received and will be handled  as soon as possible.

<!-- gh-comment-id:2600926508 --> @818000 commented on GitHub (Jan 19, 2025): Thank you. Your email is received and will be handled  as soon as possible.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67521