[GH-ISSUE #6572] Ollama States Not Enough Video Memory When It Detects Enough #4138

New Issue

GiteaMirror · 2026-04-12T15:03:02-05:00

GiteaMirror commented

2026-04-12 15:03:02 -05:00

Originally created by @czhang03 on GitHub (Aug 30, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6572

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

This is tested with the Alphaca app, here is the log of the Ollama log:

time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/var/home/cheng/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[1.3 GiB]"
time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=memory.go:168 msg="gpu has too little memory to allocate any layers" gpu="{memInfo:{TotalMemory:2147483648 FreeMemory:1376280576 FreeSwap:0} Library:rocm Variant:no vector extensions MinimumMemory:479199232 DependencyPath:/var/run/host/usr/lib64/rocm/gfx11/lib EnvWorkarounds:[] UnreliableFreeMemory:false ID:0 Name:1002:15bf Compute:gfx1103 DriverMajor:0 DriverMinor:0}"

The log seems to state that the avaliable memory 1376280576 but the required GPU memory is only 479199232, which is way smaller than the avaliable memory.

However, I did do some hack, because it is packaged in flatpak, so I am not sure of these hacks are relavent:

I only give the flatpak access to /sys/module/amdgpu/ and host-os so that it will detect the GPU,
I have setted HSA_OVERRIDE_GFX_VERSION=11.0.0 because gfx1103 is unsupported by ollama
I added /var/run/host/usr/lib64/rocm/gfx11/lib to the library path so that ollama will detect the library. Notice that I used gfx11 instead of gfx1100 because gfx1100 is an empty folder on my machine.

Related issue: https://github.com/Jeffser/Alpaca/issues/139
Additional information:

I am running fedora silverblue, and the rocm library is installed via rpm-ostree install rocminfo hipblas.
Related rocm versions: rocminfo-6.1.1-3.fc40.x86_64 and rocm-runtime-6.1.2-1.fc40.x86_64
I didn't install the dkms version of the kernel, because dkms plus secureboot is a nightmare.

To completely reproduce my setup:

install alpaca https://flathub.org/apps/com.jeffser.Alpaca
grant the following file permission: /sys/module/amdgpu/ and host-os
set the following environment variable: OLLAMA_DEBUG=1, HSA_OVERRIDE_GFX_VERSION=11.0.0, LD_LIBRARY_PATH=/var/run/host/usr/lib64/rocm/gfx11/lib:/app/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/sdk/llvm15/lib:/usr/lib/sdk/openjdk11/lib:/usr/lib/sdk/openjdk17/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib
enter the alpaca sandbox: flatpak run --command=bash com.jeffser.Alpaca
run ollama serve in the sandbox

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.3.3

Originally created by @czhang03 on GitHub (Aug 30, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6572 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? This is tested with the [Alphaca app](https://flathub.org/apps/com.jeffser.Alpaca), here is the log of the Ollama log: ``` time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/var/home/cheng/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[1.3 GiB]" time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=memory.go:168 msg="gpu has too little memory to allocate any layers" gpu="{memInfo:{TotalMemory:2147483648 FreeMemory:1376280576 FreeSwap:0} Library:rocm Variant:no vector extensions MinimumMemory:479199232 DependencyPath:/var/run/host/usr/lib64/rocm/gfx11/lib EnvWorkarounds:[] UnreliableFreeMemory:false ID:0 Name:1002:15bf Compute:gfx1103 DriverMajor:0 DriverMinor:0}" ``` The log seems to state that the avaliable memory `1376280576` but the required GPU memory is only `479199232`, which is way smaller than the avaliable memory. However, I did do some hack, because it is packaged in flatpak, so I am not sure of these hacks are relavent: - I only give the flatpak access to `/sys/module/amdgpu/` and `host-os` so that it will detect the GPU, - I have setted `HSA_OVERRIDE_GFX_VERSION=11.0.0` because `gfx1103` is unsupported by ollama - I added `/var/run/host/usr/lib64/rocm/gfx11/lib` to the library path so that ollama will detect the library. Notice that I used `gfx11` instead of `gfx1100` because `gfx1100` is an empty folder on my machine. Related issue: https://github.com/Jeffser/Alpaca/issues/139 Additional information: - I am running fedora silverblue, and the rocm library is installed via `rpm-ostree install rocminfo hipblas`. - Related rocm versions: `rocminfo-6.1.1-3.fc40.x86_64` and `rocm-runtime-6.1.2-1.fc40.x86_64` - I didn't install the `dkms` version of the kernel, because dkms plus secureboot is a nightmare. To completely reproduce my setup: - install alpaca https://flathub.org/apps/com.jeffser.Alpaca - grant the following file permission: `/sys/module/amdgpu/` and `host-os` - set the following environment variable: `OLLAMA_DEBUG=1`, `HSA_OVERRIDE_GFX_VERSION=11.0.0`, `LD_LIBRARY_PATH=/var/run/host/usr/lib64/rocm/gfx11/lib:/app/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/sdk/llvm15/lib:/usr/lib/sdk/openjdk11/lib:/usr/lib/sdk/openjdk17/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib` - enter the alpaca sandbox: `flatpak run --command=bash com.jeffser.Alpaca` - run `ollama serve` in the sandbox ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.3.3

GiteaMirror added the memory needs more info labels 2026-04-12 15:03:02 -05:00

GiteaMirror closed this issue

2026-04-12 15:03:03 -05:00

GiteaMirror commented

2026-04-12 15:03:04 -05:00

@igorschlum commented on GitHub (Sep 1, 2024):

@czhang03 why version 0.3.3 ans not version 0.3.9 ?

@igorschlum commented on GitHub (Sep 1, 2024): @czhang03 why version 0.3.3 ans not version 0.3.9 ?

GiteaMirror commented

2026-04-12 15:03:05 -05:00

@Jeffser commented on GitHub (Sep 2, 2024):

@czhang03 why version 0.3.3 ans not version 0.3.9 ?

Hi, I'm the developer of Alpaca, Ollama gets updated with every Alpaca update, I haven't released a version since then but it is coming soon with 0.3.9 Ollama included

@Jeffser commented on GitHub (Sep 2, 2024): > @czhang03 why version 0.3.3 ans not version 0.3.9 ? Hi, I'm the developer of Alpaca, Ollama gets updated with every Alpaca update, I haven't released a version since then but it is coming soon with 0.3.9 Ollama included

GiteaMirror commented

2026-04-12 15:03:06 -05:00

@igorschlum commented on GitHub (Sep 2, 2024):

@Jeffser I think Ollama has resolved some issues related to VRAM memory. If you update Alpaca to the latest version of Ollama, the issue you're facing might be solved.

Additionally, how can I run Alpaca on macOS? Is there a Docker solution to run Flatpak packages in Docker to make it work on macOS?

@igorschlum commented on GitHub (Sep 2, 2024): @Jeffser I think Ollama has resolved some issues related to VRAM memory. If you update Alpaca to the latest version of Ollama, the issue you're facing might be solved. Additionally, how can I run Alpaca on macOS? Is there a Docker solution to run Flatpak packages in Docker to make it work on macOS?

GiteaMirror commented

2026-04-12 15:03:06 -05:00

@Jeffser commented on GitHub (Sep 2, 2024):

@Jeffser I think Ollama has resolved some issues related to VRAM memory. If you update Alpaca to the latest version of Ollama, the issue you're facing might be solved.

Alright, I will update the instance.

Additionally, how can I run Alpaca on macOS? Is there a Docker solution to run Flatpak packages in Docker to make it work on macOS?

AFAIK there's no way of running Flatpaks on Docker or MacOS natively, I'll see if I can make a port for mac, GTK apps should be able to run. It might take a couple of days though

@Jeffser commented on GitHub (Sep 2, 2024): > @Jeffser I think Ollama has resolved some issues related to VRAM memory. If you update Alpaca to the latest version of Ollama, the issue you're facing might be solved. Alright, I will update the instance. > Additionally, how can I run Alpaca on macOS? Is there a Docker solution to run Flatpak packages in Docker to make it work on macOS? AFAIK there's no way of running Flatpaks on Docker or MacOS natively, I'll see if I can make a port for mac, GTK apps should be able to run. It might take a couple of days though

GiteaMirror commented

2026-04-12 15:03:07 -05:00

@igorschlum commented on GitHub (Sep 2, 2024):

https://www.gtk.org/docs/installations/macos

Yes, It could be a good solution.

@igorschlum commented on GitHub (Sep 2, 2024): https://www.gtk.org/docs/installations/macos Yes, It could be a good solution.

GiteaMirror commented

2026-04-12 15:03:07 -05:00

@dhiltgen commented on GitHub (Sep 5, 2024):

I believe you're running in iGPU 780M with 2G assigned in BIOS for VRAM. What model are you trying to load? I'm not sure if there's a bug here, or you're trying to load a model that is too large for your dedicated VRAM. You can try loading a smaller model, reduce the context size, or adjust your BIOS settings to allocate more system memory to the iGPU.

@dhiltgen commented on GitHub (Sep 5, 2024): I believe you're running in iGPU 780M with 2G assigned in BIOS for VRAM. What model are you trying to load? I'm not sure if there's a bug here, or you're trying to load a model that is too large for your dedicated VRAM. You can try loading a smaller model, reduce the context size, or adjust your BIOS settings to allocate more system memory to the iGPU.

GiteaMirror commented

2026-04-12 15:03:08 -05:00

@igorschlum commented on GitHub (Sep 5, 2024):

@Jeffser Could you try to build a version with a smaller LLM like Smollm we could see if it fix the issue found by @czhang03

@czhang03 could you try to run ollama alone and see if you can reproduce the issue? if not I suggest to close this issue and let @Jeffser to create a new issue if he can reproduce the issue without his solution.

@igorschlum commented on GitHub (Sep 5, 2024): @Jeffser Could you try to build a version with a smaller LLM like [Smollm](https://ollama.com/library/smollm) we could see if it fix the issue found by @czhang03 @czhang03 could you try to run ollama alone and see if you can reproduce the issue? if not I suggest to close this issue and let @Jeffser to create a new issue if he can reproduce the issue without his solution.

GiteaMirror commented

2026-04-12 15:03:08 -05:00

@Jeffser commented on GitHub (Sep 5, 2024):

Hi, Alpaca allows to download and use any model, I also updated the app with the newest Ollama version

@Jeffser commented on GitHub (Sep 5, 2024): Hi, Alpaca allows to download and use any model, I also updated the app with the newest Ollama version

GiteaMirror commented

2026-04-12 15:03:09 -05:00

@igorschlum commented on GitHub (Sep 5, 2024):

@Jeffser Thank you. @czhang03 can you test again and try the older model or smollm, a smaller model?

@igorschlum commented on GitHub (Sep 5, 2024): @Jeffser Thank you. @czhang03 can you test again and try the older model or smollm, a smaller model?

GiteaMirror commented

2026-04-12 15:03:09 -05:00

@czhang03 commented on GitHub (Sep 9, 2024):

Thank you guys for the quick response and help.

I have tested on 3.0.9, and have experienced similar issues:

time=2024-09-09T10:40:43.425-04:00 level=DEBUG source=memory.go:168 msg="gpu has too little memory to allocate any layers" gpu="{memInfo:{TotalMemory:2147483648 FreeMemory:613441536 FreeSwap:0} Library:rocm Variant: MinimumMemory:479199232 DependencyPath:/var/run/host/usr/lib64/rocm/gfx11/lib EnvWorkarounds:[] UnreliableFreeMemory:false ID:0 Name:1002:15bf Compute:gfx1103 DriverMajor:0 DriverMinor:0}"

where the free memory seems to be larger than the minimum memory, yet the model is reporting too little memory.

On a probably unrelated note,

time=2024-09-09T10:35:32.855-04:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"

It seems like unlike 3.0.3, ollama 0.3.9 is looking for the file /sys/module/amdgpu/version, which is not present on my system:

> ls /sys/module/amdgpu/
coresize  drivers/  holders/  initsize  initstate  notes/  parameters/  refcnt  sections/  taint  uevent

@Jeffser is pushing a flatpak plugin to incorporate rocm: https://github.com/flathub/flathub/pull/5552 , which might improve the current situation of AMD support for ollama in Alpaca. I will probably have more time to do more in-depth testing in a few days.

@czhang03 commented on GitHub (Sep 9, 2024): Thank you guys for the quick response and help. I have tested on 3.0.9, and have experienced similar issues: ``` time=2024-09-09T10:40:43.425-04:00 level=DEBUG source=memory.go:168 msg="gpu has too little memory to allocate any layers" gpu="{memInfo:{TotalMemory:2147483648 FreeMemory:613441536 FreeSwap:0} Library:rocm Variant: MinimumMemory:479199232 DependencyPath:/var/run/host/usr/lib64/rocm/gfx11/lib EnvWorkarounds:[] UnreliableFreeMemory:false ID:0 Name:1002:15bf Compute:gfx1103 DriverMajor:0 DriverMinor:0}" ``` where the free memory seems to be larger than the minimum memory, yet the model is reporting too little memory. --- On a probably unrelated note, ``` time=2024-09-09T10:35:32.855-04:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" ``` It seems like unlike 3.0.3, ollama 0.3.9 is looking for the file `/sys/module/amdgpu/version`, which is not present on my system: ``` > ls /sys/module/amdgpu/ coresize drivers/ holders/ initsize initstate notes/ parameters/ refcnt sections/ taint uevent ``` --- @Jeffser is pushing a flatpak plugin to incorporate rocm: https://github.com/flathub/flathub/pull/5552 , which might improve the current situation of AMD support for ollama in Alpaca. I will probably have more time to do more in-depth testing in a few days.

GiteaMirror commented

2026-04-12 15:03:09 -05:00

@dhiltgen commented on GitHub (Sep 9, 2024):

With version v0.3.10 the log message "gpu has too little memory..." will have more details about the calculations so that will help us root cause these scenarios. My suspicion is you may be setting a large context size.

@dhiltgen commented on GitHub (Sep 9, 2024): With version v0.3.10 the log message "gpu has too little memory..." will have more details about the calculations so that will help us root cause these scenarios. My suspicion is you may be setting a large context size.

GiteaMirror commented

2026-04-12 15:03:10 -05:00

@igorschlum commented on GitHub (Sep 9, 2024):

@czhang03 you can download a candidate to version 0.3.10 here https://github.com/ollama/ollama/releases

@igorschlum commented on GitHub (Sep 9, 2024): @czhang03 you can download a candidate to version 0.3.10 here https://github.com/ollama/ollama/releases

GiteaMirror commented

2026-04-12 15:03:11 -05:00

@czhang03 commented on GitHub (Sep 11, 2024):

Hi, I was able to get 0.3.10 running in distrobox. I have tried smollm:135m, everything seems to work fine. But when I move to llama 8b, I got the following error:

time=2024-09-11T14:30:23.328-04:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/var/home/cheng/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-11T14:30:23.328-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="149.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="560.0 MiB"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="149.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="560.0 MiB"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
time=2024-09-11T14:30:23.330-04:00 level=INFO source=server.go:101 msg="system memory" total="13.4 GiB" free="6.6 GiB" free_swap="3.9 GiB"
time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB"
time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"

I am not sure what each value in the debug logs mean, but I hope these messages are helpful for you.

@czhang03 commented on GitHub (Sep 11, 2024): Hi, I was able to get 0.3.10 running in distrobox. I have tried smollm:135m, everything seems to work fine. But when I move to llama 8b, I got the following error: ``` time=2024-09-11T14:30:23.328-04:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/var/home/cheng/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-11T14:30:23.328-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="149.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="560.0 MiB" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="149.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="560.0 MiB" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" time=2024-09-11T14:30:23.330-04:00 level=INFO source=server.go:101 msg="system memory" total="13.4 GiB" free="6.6 GiB" free_swap="3.9 GiB" time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB" time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" ``` I am not sure what each value in the debug logs mean, but I hope these messages are helpful for you.

GiteaMirror commented

2026-04-12 15:03:11 -05:00

@igorschlum commented on GitHub (Sep 11, 2024):

@czhang03 the message says that you have only 633.3 MiB (mega bytes) and that the LLM needs 2 GiB (giga bytes), so you need at least 3 times more memory of VRAM to run the model.

It seems that you have 1,3 GB of memory. Ollama could load the first layer of 633 MB and then nothing more due to VRAM limitation. It why a tiny LLM works and not llama3.1.

@igorschlum commented on GitHub (Sep 11, 2024): @czhang03 the message says that you have only 633.3 MiB (mega bytes) and that the LLM needs 2 GiB (giga bytes), so you need at least 3 times more memory of VRAM to run the model. It seems that you have 1,3 GB of memory. Ollama could load the first layer of 633 MB and then nothing more due to VRAM limitation. It why a tiny LLM works and not llama3.1.

GiteaMirror commented

2026-04-12 15:03:12 -05:00

@czhang03 commented on GitHub (Sep 11, 2024):

Thanks for the info. It seems like a igpu is not the best platform for local LLM then.

@czhang03 commented on GitHub (Sep 11, 2024): Thanks for the info. It seems like a igpu is not the best platform for local LLM then.

GiteaMirror commented

2026-04-12 15:03:12 -05:00

@czhang03 commented on GitHub (Sep 11, 2024):

BTW, I always assumed total="2.0 GiB" means my total VRAM is 2GB, which makes sense, as that is the setting in my uefi configuration. Is my understanding correct?

AFAIK igpu uses system memory as vram, and I have a good amount of them (16gb), is there anyway to let ollama request more memory to used as vram?

@czhang03 commented on GitHub (Sep 11, 2024): BTW, I always assumed `total="2.0 GiB"` means my total VRAM is 2GB, which makes sense, as that is the setting in my uefi configuration. Is my understanding correct? AFAIK igpu uses system memory as vram, and I have a good amount of them (16gb), is there anyway to let ollama request more memory to used as vram?

GiteaMirror commented

2026-04-12 15:03:14 -05:00

@igorschlum commented on GitHub (Sep 11, 2024):

An IGPU is a graphics processing unit that is integrated directly into the same chip as the CPU (Central Processing Unit). Unlike dedicated GPUs, which are separate from the CPU and have their own dedicated memory, IGPUs share system memory with the CPU

Mac Studio computers that use Apple Silicon chips (such as the M1 Max, M1 Ultra, M2 Max, etc.) have a GPU integrated directly into the chip, similar to an iGPU (Integrated GPU), but it is far more powerful than traditional iGPUs.

In this context, you could technically say they use an integrated GPU, but it is not the typical iGPU (like those found in Intel processors). The GPU in Apple Silicon chips is optimized to deliver high performance, often rivaling dedicated graphics cards (dGPUs) in certain scenarios.

So, for Mac Studio with Apple Silicon, you could say they have an “iGPU,” but with capabilities that far exceed those of a typical iGPU.

@igorschlum commented on GitHub (Sep 11, 2024): An IGPU is a graphics processing unit that is integrated directly into the same chip as the CPU (Central Processing Unit). Unlike dedicated GPUs, which are separate from the CPU and have their own dedicated memory, IGPUs share system memory with the CPU Mac Studio computers that use Apple Silicon chips (such as the M1 Max, M1 Ultra, M2 Max, etc.) have a GPU integrated directly into the chip, similar to an iGPU (Integrated GPU), but it is far more powerful than traditional iGPUs. In this context, you could technically say they use an integrated GPU, but it is not the typical iGPU (like those found in Intel processors). The GPU in Apple Silicon chips is optimized to deliver high performance, often rivaling dedicated graphics cards (dGPUs) in certain scenarios. So, for Mac Studio with Apple Silicon, you could say they have an “iGPU,” but with capabilities that far exceed those of a typical iGPU.

GiteaMirror commented

2026-04-12 15:03:15 -05:00

@igorschlum commented on GitHub (Sep 11, 2024):

@czhang03 you can ask chatGPT or Phind.com to get help on iGPU.

On PCs, it is often possible to manually allocate more memory to an iGPU (Integrated GPU), depending on the motherboard and BIOS. Here’s how it typically works:

1.	BIOS/UEFI Settings: On many PCs, you can enter the BIOS/UEFI settings during boot-up and manually adjust the amount of memory allocated to the iGPU. This is often found under settings related to Graphics, Integrated Peripherals, or Advanced Chipset Configuration.
2.	Fixed Allocation: You may be able to set a fixed amount of system RAM for the iGPU, such as 512MB, 1GB, or more. However, this reduces the available memory for other tasks since it’s permanently reserved for the iGPU.
3.	Dynamic Allocation: Some systems use dynamic memory allocation, where the iGPU automatically uses more RAM when needed, up to a certain limit, without requiring manual intervention.
4.	Limitations: The maximum amount of RAM you can allocate to an iGPU depends on the total RAM installed and the motherboard’s capabilities. Some lower-end systems may cap the amount of memory the iGPU can use.

In summary, yes, it’s possible to allocate more memory to the iGPU on a PC, but it varies depending on the system’s hardware and BIOS capabilities.

@igorschlum commented on GitHub (Sep 11, 2024): @czhang03 you can ask chatGPT or Phind.com to get help on iGPU. On PCs, it is often possible to manually allocate more memory to an iGPU (Integrated GPU), depending on the motherboard and BIOS. Here’s how it typically works: 1. BIOS/UEFI Settings: On many PCs, you can enter the BIOS/UEFI settings during boot-up and manually adjust the amount of memory allocated to the iGPU. This is often found under settings related to Graphics, Integrated Peripherals, or Advanced Chipset Configuration. 2. Fixed Allocation: You may be able to set a fixed amount of system RAM for the iGPU, such as 512MB, 1GB, or more. However, this reduces the available memory for other tasks since it’s permanently reserved for the iGPU. 3. Dynamic Allocation: Some systems use dynamic memory allocation, where the iGPU automatically uses more RAM when needed, up to a certain limit, without requiring manual intervention. 4. Limitations: The maximum amount of RAM you can allocate to an iGPU depends on the total RAM installed and the motherboard’s capabilities. Some lower-end systems may cap the amount of memory the iGPU can use. In summary, yes, it’s possible to allocate more memory to the iGPU on a PC, but it varies depending on the system’s hardware and BIOS capabilities.

GiteaMirror commented

2026-04-12 15:03:16 -05:00

@czhang03 commented on GitHub (Sep 11, 2024):

It seems like there is already an issue tracking the UMA support in ollama: https://github.com/ollama/ollama/issues/2637

llama.cpp has a compiler flag that enables dynamic allocation of vram, instead of check and fail approach, which is the default: https://github.com/Mozilla-Ocho/llamafile/discussions/366

@czhang03 commented on GitHub (Sep 11, 2024): It seems like there is already an issue tracking the UMA support in ollama: https://github.com/ollama/ollama/issues/2637 llama.cpp has a compiler flag that enables dynamic allocation of vram, instead of check and fail approach, which is the default: https://github.com/Mozilla-Ocho/llamafile/discussions/366

GiteaMirror referenced this issue

2026-04-12 23:29:18 -05:00

[PR #4138] [CLOSED] Add log to file flag for server #11394

GiteaMirror referenced this issue

2026-04-16 05:39:05 -05:00

[PR #4138] [CLOSED] Add log to file flag for server #16665

GiteaMirror referenced this issue

2026-04-19 15:57:52 -05:00

[PR #4138] [CLOSED] Add log to file flag for server #21934

GiteaMirror referenced this issue

2026-04-22 21:58:43 -05:00

[PR #4138] [CLOSED] Add log to file flag for server #37267

GiteaMirror referenced this issue

2026-04-24 22:23:37 -05:00

[PR #4138] [CLOSED] Add log to file flag for server #42642

GiteaMirror referenced this issue

2026-04-29 12:55:00 -05:00

[PR #4138] [CLOSED] Add log to file flag for server #58091

GiteaMirror referenced this issue

2026-05-05 05:37:12 -05:00

[PR #4138] [CLOSED] Add log to file flag for server #73688

Sign in to join this conversation.

Branches Tags

main

parth-agent-tui-slash-selector

parth-remove-ollama-agent-command

parth-agent-harness-skills-synthetic-tool

hoyyeva/fix-anthropic-text-before-thinking

parth-agent-cli-markdown-rendering

mxyng/docs-cloud

parth-update-hermes-launch

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-api-status-context-length

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#4138