[GH-ISSUE #2566] Not enough vram available, falling back to CPU only, AMD 16 GB VRAM #27266

Closed
opened 2026-04-22 04:27:35 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @user82622 on GitHub (Feb 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2566

Originally assigned to: @jmorganca on GitHub.

I use an iGPU with ROCm and it worked great until like yesterday when i recompiled my Docker Image with the newest ollama version. since then I get "not enough vram available, falling back to CPU only" GPU seems to be detected.

time=xxx level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60000 /opt/rocm-6.0.0/lib/librocm_smi64.so.6.0.60000]"
time=xxx level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=xxx level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[GIN] xxx | 200 |    4.592477ms |   192.168.33.14 | GET      "/api/tags"
time=xxx level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=xxx level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=xxx level=INFO source=llm.go:111 msg="not enough vram available, falling back to CPU only"
Originally created by @user82622 on GitHub (Feb 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2566 Originally assigned to: @jmorganca on GitHub. I use an iGPU with ROCm and it worked great until like yesterday when i recompiled my Docker Image with the newest ollama version. since then I get "not enough vram available, falling back to CPU only" GPU seems to be detected. ``` time=xxx level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60000 /opt/rocm-6.0.0/lib/librocm_smi64.so.6.0.60000]" time=xxx level=INFO source=gpu.go:109 msg="Radeon GPU detected" time=xxx level=INFO source=cpu_common.go:11 msg="CPU has AVX2" [GIN] xxx | 200 | 4.592477ms | 192.168.33.14 | GET "/api/tags" time=xxx level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=xxx level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=xxx level=INFO source=llm.go:111 msg="not enough vram available, falling back to CPU only" ```
GiteaMirror added the amdbug labels 2026-04-22 04:27:35 -05:00
Author
Owner

@SinanAkkoyun commented on GitHub (Feb 18, 2024):

@user82622 How did you install ollama for AMD? I cannot get it to work at all

<!-- gh-comment-id:1951318975 --> @SinanAkkoyun commented on GitHub (Feb 18, 2024): @user82622 How did you install ollama for AMD? I cannot get it to work at all
Author
Owner

@user82622 commented on GitHub (Feb 18, 2024):

I was compiling the Docker Container with ROCm and Ollama based on this https://github.com/prawilny/ollama-rocm-docker

On 18 February 2024 13:59:37 CET, Sinan @.***> wrote:

@user82622 How did you install ollama for AMD? I cannot get it to work at all

--
Reply to this email directly or view it on GitHub:
https://github.com/ollama/ollama/issues/2566#issuecomment-1951318975
You are receiving this because you were mentioned.

Message ID: @.***>

<!-- gh-comment-id:1951329298 --> @user82622 commented on GitHub (Feb 18, 2024): I was compiling the Docker Container with ROCm and Ollama based on this https://github.com/prawilny/ollama-rocm-docker On 18 February 2024 13:59:37 CET, Sinan ***@***.***> wrote: >@user82622 How did you install ollama for AMD? I cannot get it to work at all > >-- >Reply to this email directly or view it on GitHub: >https://github.com/ollama/ollama/issues/2566#issuecomment-1951318975 >You are receiving this because you were mentioned. > >Message ID: ***@***.***>
Author
Owner

@kennethwork101 commented on GitHub (Feb 25, 2024):

I ran into the same issue while running a set of tests using ollama version is 0.1.25.
Note each test loads a different LLM and this is reproduceable but only happens after large number of tests like 50 or more.
The configuration is windows 11 with wsl2 on ubuntu 22.04 using RTX 4070 TI.
After this error the system does not recover until after restart ollama server.

time=2024-02-24T22:54:20.311-08:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop"
[GIN] 2024/02/24 - 22:54:38 | 200 | 21.560724222s | 127.0.0.1 | POST "/api/generate"
time=2024-02-24T22:54:38.515-08:00 level=INFO source=routes.go:78 msg="changing loaded model"
time=2024-02-24T22:54:38.607-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-24T22:54:38.607-08:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.9"
time=2024-02-24T22:54:38.607-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-24T22:54:38.607-08:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.9"
time=2024-02-24T22:54:38.607-08:00 level=INFO source=llm.go:111 msg="not enough vram available, falling back to CPU only"
time=2024-02-24T22:54:38.607-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
loading library /tmp/ollama3199692928/cpu_avx2/libext_server.so

<!-- gh-comment-id:1962841593 --> @kennethwork101 commented on GitHub (Feb 25, 2024): I ran into the same issue while running a set of tests using ollama version is 0.1.25. Note each test loads a different LLM and this is reproduceable but only happens after large number of tests like 50 or more. The configuration is windows 11 with wsl2 on ubuntu 22.04 using RTX 4070 TI. After this error the system does not recover until after restart ollama server. time=2024-02-24T22:54:20.311-08:00 level=INFO source=dyn_ext_server.go:156 msg="Starting llama main loop" [GIN] 2024/02/24 - 22:54:38 | 200 | 21.560724222s | 127.0.0.1 | POST "/api/generate" time=2024-02-24T22:54:38.515-08:00 level=INFO source=routes.go:78 msg="changing loaded model" time=2024-02-24T22:54:38.607-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-24T22:54:38.607-08:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.9" time=2024-02-24T22:54:38.607-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-24T22:54:38.607-08:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.9" time=2024-02-24T22:54:38.607-08:00 level=INFO source=llm.go:111 msg="not enough vram available, falling back to CPU only" time=2024-02-24T22:54:38.607-08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" loading library /tmp/ollama3199692928/cpu_avx2/libext_server.so
Author
Owner

@kennethwork101 commented on GitHub (Feb 29, 2024):

Reproduce this issue on ubuntu 22.04.1 with RTX 4070 TI

uname -a
Linux kenneth-MS-7E06 6.5.0-21-generic #2122.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 9 13:32:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
kenneth@kenneth-MS-7E06:
$ ollama --version
ollama version is 0.1.27

Feb 29 12:32:11 kenneth-MS-7E06 ollama[1774]: time=2024-02-29T12:32:11.704-08:00 level=INFO source=llm.go:111 msg="not enough vram available, falling back to CPU only"

<!-- gh-comment-id:1971920714 --> @kennethwork101 commented on GitHub (Feb 29, 2024): Reproduce this issue on ubuntu 22.04.1 with RTX 4070 TI uname -a Linux kenneth-MS-7E06 6.5.0-21-generic #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 9 13:32:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux kenneth@kenneth-MS-7E06:~$ ollama --version ollama version is 0.1.27 Feb 29 12:32:11 kenneth-MS-7E06 ollama[1774]: time=2024-02-29T12:32:11.704-08:00 level=INFO source=llm.go:111 msg="not enough vram available, falling back to CPU only"
Author
Owner

@theiamdude commented on GitHub (Mar 17, 2024):

I am getting a similar error on M1 Pro chip. I am running 0.1.29 version which installed via homebrew.

time=2024-03-17T22:03:50.073+11:00 level=INFO source=llm.go:76 msg="not enough vram available, setting num_gpu=0"

<!-- gh-comment-id:2002411620 --> @theiamdude commented on GitHub (Mar 17, 2024): I am getting a similar error on M1 Pro chip. I am running 0.1.29 version which installed via homebrew. time=2024-03-17T22:03:50.073+11:00 level=INFO source=llm.go:76 msg="not enough vram available, setting num_gpu=0"
Author
Owner

@dhiltgen commented on GitHub (Mar 21, 2024):

AMD Integrated GPUs are not yet supported. We're tracking that support with #2637

A few folks have chimed in with memory allocation topics. We're working on improvements to our memory prediction which should help us better utilize the available memory in a future release.

<!-- gh-comment-id:2012306204 --> @dhiltgen commented on GitHub (Mar 21, 2024): AMD Integrated GPUs are not yet supported. We're tracking that support with #2637 A few folks have chimed in with memory allocation topics. We're working on improvements to our memory prediction which should help us better utilize the available memory in a future release.
Author
Owner

@herbgruutz commented on GitHub (May 19, 2024):

How are you all able to change vRAM in you BIOS? I'm running an HP and I don't have any options to do so.
Running Fedora 40
AMD Ryzen™ 7 5800U with Radeon™ Graphics × 16

<!-- gh-comment-id:2119288481 --> @herbgruutz commented on GitHub (May 19, 2024): How are you all able to change vRAM in you BIOS? I'm running an HP and I don't have any options to do so. Running Fedora 40 AMD Ryzen™ 7 5800U with Radeon™ Graphics × 16
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27266