[GH-ISSUE #6572] Ollama States Not Enough Video Memory When It Detects Enough #4138

Closed
opened 2026-04-12 15:03:02 -05:00 by GiteaMirror · 19 comments
Owner

Originally created by @czhang03 on GitHub (Aug 30, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6572

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

This is tested with the Alphaca app, here is the log of the Ollama log:

time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/var/home/cheng/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[1.3 GiB]"
time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=memory.go:168 msg="gpu has too little memory to allocate any layers" gpu="{memInfo:{TotalMemory:2147483648 FreeMemory:1376280576 FreeSwap:0} Library:rocm Variant:no vector extensions MinimumMemory:479199232 DependencyPath:/var/run/host/usr/lib64/rocm/gfx11/lib EnvWorkarounds:[] UnreliableFreeMemory:false ID:0 Name:1002:15bf Compute:gfx1103 DriverMajor:0 DriverMinor:0}"

The log seems to state that the avaliable memory 1376280576 but the required GPU memory is only 479199232, which is way smaller than the avaliable memory.

However, I did do some hack, because it is packaged in flatpak, so I am not sure of these hacks are relavent:

  • I only give the flatpak access to /sys/module/amdgpu/ and host-os so that it will detect the GPU,
  • I have setted HSA_OVERRIDE_GFX_VERSION=11.0.0 because gfx1103 is unsupported by ollama
  • I added /var/run/host/usr/lib64/rocm/gfx11/lib to the library path so that ollama will detect the library. Notice that I used gfx11 instead of gfx1100 because gfx1100 is an empty folder on my machine.

Related issue: https://github.com/Jeffser/Alpaca/issues/139
Additional information:

  • I am running fedora silverblue, and the rocm library is installed via rpm-ostree install rocminfo hipblas.
  • Related rocm versions: rocminfo-6.1.1-3.fc40.x86_64 and rocm-runtime-6.1.2-1.fc40.x86_64
  • I didn't install the dkms version of the kernel, because dkms plus secureboot is a nightmare.

To completely reproduce my setup:

  • install alpaca https://flathub.org/apps/com.jeffser.Alpaca
  • grant the following file permission: /sys/module/amdgpu/ and host-os
  • set the following environment variable: OLLAMA_DEBUG=1, HSA_OVERRIDE_GFX_VERSION=11.0.0, LD_LIBRARY_PATH=/var/run/host/usr/lib64/rocm/gfx11/lib:/app/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/sdk/llvm15/lib:/usr/lib/sdk/openjdk11/lib:/usr/lib/sdk/openjdk17/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib
  • enter the alpaca sandbox: flatpak run --command=bash com.jeffser.Alpaca
  • run ollama serve in the sandbox

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.3.3

Originally created by @czhang03 on GitHub (Aug 30, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6572 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? This is tested with the [Alphaca app](https://flathub.org/apps/com.jeffser.Alpaca), here is the log of the Ollama log: ``` time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/var/home/cheng/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[1.3 GiB]" time=2024-08-30T15:52:56.823-04:00 level=DEBUG source=memory.go:168 msg="gpu has too little memory to allocate any layers" gpu="{memInfo:{TotalMemory:2147483648 FreeMemory:1376280576 FreeSwap:0} Library:rocm Variant:no vector extensions MinimumMemory:479199232 DependencyPath:/var/run/host/usr/lib64/rocm/gfx11/lib EnvWorkarounds:[] UnreliableFreeMemory:false ID:0 Name:1002:15bf Compute:gfx1103 DriverMajor:0 DriverMinor:0}" ``` The log seems to state that the avaliable memory `1376280576` but the required GPU memory is only `479199232`, which is way smaller than the avaliable memory. However, I did do some hack, because it is packaged in flatpak, so I am not sure of these hacks are relavent: - I only give the flatpak access to `/sys/module/amdgpu/` and `host-os` so that it will detect the GPU, - I have setted `HSA_OVERRIDE_GFX_VERSION=11.0.0` because `gfx1103` is unsupported by ollama - I added `/var/run/host/usr/lib64/rocm/gfx11/lib` to the library path so that ollama will detect the library. Notice that I used `gfx11` instead of `gfx1100` because `gfx1100` is an empty folder on my machine. Related issue: https://github.com/Jeffser/Alpaca/issues/139 Additional information: - I am running fedora silverblue, and the rocm library is installed via `rpm-ostree install rocminfo hipblas`. - Related rocm versions: `rocminfo-6.1.1-3.fc40.x86_64` and `rocm-runtime-6.1.2-1.fc40.x86_64` - I didn't install the `dkms` version of the kernel, because dkms plus secureboot is a nightmare. To completely reproduce my setup: - install alpaca https://flathub.org/apps/com.jeffser.Alpaca - grant the following file permission: `/sys/module/amdgpu/` and `host-os` - set the following environment variable: `OLLAMA_DEBUG=1`, `HSA_OVERRIDE_GFX_VERSION=11.0.0`, `LD_LIBRARY_PATH=/var/run/host/usr/lib64/rocm/gfx11/lib:/app/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/sdk/llvm15/lib:/usr/lib/sdk/openjdk11/lib:/usr/lib/sdk/openjdk17/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib` - enter the alpaca sandbox: `flatpak run --command=bash com.jeffser.Alpaca` - run `ollama serve` in the sandbox ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.3.3
GiteaMirror added the memoryneeds more info labels 2026-04-12 15:03:02 -05:00
Author
Owner

@igorschlum commented on GitHub (Sep 1, 2024):

@czhang03 why version 0.3.3 ans not version 0.3.9 ?

<!-- gh-comment-id:2323514794 --> @igorschlum commented on GitHub (Sep 1, 2024): @czhang03 why version 0.3.3 ans not version 0.3.9 ?
Author
Owner

@Jeffser commented on GitHub (Sep 2, 2024):

@czhang03 why version 0.3.3 ans not version 0.3.9 ?

Hi, I'm the developer of Alpaca, Ollama gets updated with every Alpaca update, I haven't released a version since then but it is coming soon with 0.3.9 Ollama included

<!-- gh-comment-id:2323586153 --> @Jeffser commented on GitHub (Sep 2, 2024): > @czhang03 why version 0.3.3 ans not version 0.3.9 ? Hi, I'm the developer of Alpaca, Ollama gets updated with every Alpaca update, I haven't released a version since then but it is coming soon with 0.3.9 Ollama included
Author
Owner

@igorschlum commented on GitHub (Sep 2, 2024):

@Jeffser I think Ollama has resolved some issues related to VRAM memory. If you update Alpaca to the latest version of Ollama, the issue you're facing might be solved.

Additionally, how can I run Alpaca on macOS? Is there a Docker solution to run Flatpak packages in Docker to make it work on macOS?

<!-- gh-comment-id:2324024069 --> @igorschlum commented on GitHub (Sep 2, 2024): @Jeffser I think Ollama has resolved some issues related to VRAM memory. If you update Alpaca to the latest version of Ollama, the issue you're facing might be solved. Additionally, how can I run Alpaca on macOS? Is there a Docker solution to run Flatpak packages in Docker to make it work on macOS?
Author
Owner

@Jeffser commented on GitHub (Sep 2, 2024):

@Jeffser I think Ollama has resolved some issues related to VRAM memory. If you update Alpaca to the latest version of Ollama, the issue you're facing might be solved.

Alright, I will update the instance.

Additionally, how can I run Alpaca on macOS? Is there a Docker solution to run Flatpak packages in Docker to make it work on macOS?

AFAIK there's no way of running Flatpaks on Docker or MacOS natively, I'll see if I can make a port for mac, GTK apps should be able to run. It might take a couple of days though

<!-- gh-comment-id:2324033938 --> @Jeffser commented on GitHub (Sep 2, 2024): > @Jeffser I think Ollama has resolved some issues related to VRAM memory. If you update Alpaca to the latest version of Ollama, the issue you're facing might be solved. Alright, I will update the instance. > Additionally, how can I run Alpaca on macOS? Is there a Docker solution to run Flatpak packages in Docker to make it work on macOS? AFAIK there's no way of running Flatpaks on Docker or MacOS natively, I'll see if I can make a port for mac, GTK apps should be able to run. It might take a couple of days though
Author
Owner

@igorschlum commented on GitHub (Sep 2, 2024):

https://www.gtk.org/docs/installations/macos

Yes, It could be a good solution.

<!-- gh-comment-id:2324046654 --> @igorschlum commented on GitHub (Sep 2, 2024): https://www.gtk.org/docs/installations/macos Yes, It could be a good solution.
Author
Owner

@dhiltgen commented on GitHub (Sep 5, 2024):

I believe you're running in iGPU 780M with 2G assigned in BIOS for VRAM. What model are you trying to load? I'm not sure if there's a bug here, or you're trying to load a model that is too large for your dedicated VRAM. You can try loading a smaller model, reduce the context size, or adjust your BIOS settings to allocate more system memory to the iGPU.

<!-- gh-comment-id:2332295913 --> @dhiltgen commented on GitHub (Sep 5, 2024): I believe you're running in iGPU 780M with 2G assigned in BIOS for VRAM. What model are you trying to load? I'm not sure if there's a bug here, or you're trying to load a model that is too large for your dedicated VRAM. You can try loading a smaller model, reduce the context size, or adjust your BIOS settings to allocate more system memory to the iGPU.
Author
Owner

@igorschlum commented on GitHub (Sep 5, 2024):

@Jeffser Could you try to build a version with a smaller LLM like Smollm we could see if it fix the issue found by @czhang03

@czhang03 could you try to run ollama alone and see if you can reproduce the issue? if not I suggest to close this issue and let @Jeffser to create a new issue if he can reproduce the issue without his solution.

<!-- gh-comment-id:2332576710 --> @igorschlum commented on GitHub (Sep 5, 2024): @Jeffser Could you try to build a version with a smaller LLM like [Smollm](https://ollama.com/library/smollm) we could see if it fix the issue found by @czhang03 @czhang03 could you try to run ollama alone and see if you can reproduce the issue? if not I suggest to close this issue and let @Jeffser to create a new issue if he can reproduce the issue without his solution.
Author
Owner

@Jeffser commented on GitHub (Sep 5, 2024):

Hi, Alpaca allows to download and use any model, I also updated the app with the newest Ollama version

<!-- gh-comment-id:2332580019 --> @Jeffser commented on GitHub (Sep 5, 2024): Hi, Alpaca allows to download and use any model, I also updated the app with the newest Ollama version
Author
Owner

@igorschlum commented on GitHub (Sep 5, 2024):

@Jeffser Thank you. @czhang03 can you test again and try the older model or smollm, a smaller model?

<!-- gh-comment-id:2332614984 --> @igorschlum commented on GitHub (Sep 5, 2024): @Jeffser Thank you. @czhang03 can you test again and try the older model or smollm, a smaller model?
Author
Owner

@czhang03 commented on GitHub (Sep 9, 2024):

Thank you guys for the quick response and help.

I have tested on 3.0.9, and have experienced similar issues:

time=2024-09-09T10:40:43.425-04:00 level=DEBUG source=memory.go:168 msg="gpu has too little memory to allocate any layers" gpu="{memInfo:{TotalMemory:2147483648 FreeMemory:613441536 FreeSwap:0} Library:rocm Variant: MinimumMemory:479199232 DependencyPath:/var/run/host/usr/lib64/rocm/gfx11/lib EnvWorkarounds:[] UnreliableFreeMemory:false ID:0 Name:1002:15bf Compute:gfx1103 DriverMajor:0 DriverMinor:0}"

where the free memory seems to be larger than the minimum memory, yet the model is reporting too little memory.


On a probably unrelated note,

time=2024-09-09T10:35:32.855-04:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"

It seems like unlike 3.0.3, ollama 0.3.9 is looking for the file /sys/module/amdgpu/version, which is not present on my system:

> ls /sys/module/amdgpu/
coresize  drivers/  holders/  initsize  initstate  notes/  parameters/  refcnt  sections/  taint  uevent

@Jeffser is pushing a flatpak plugin to incorporate rocm: https://github.com/flathub/flathub/pull/5552 , which might improve the current situation of AMD support for ollama in Alpaca. I will probably have more time to do more in-depth testing in a few days.

<!-- gh-comment-id:2338330479 --> @czhang03 commented on GitHub (Sep 9, 2024): Thank you guys for the quick response and help. I have tested on 3.0.9, and have experienced similar issues: ``` time=2024-09-09T10:40:43.425-04:00 level=DEBUG source=memory.go:168 msg="gpu has too little memory to allocate any layers" gpu="{memInfo:{TotalMemory:2147483648 FreeMemory:613441536 FreeSwap:0} Library:rocm Variant: MinimumMemory:479199232 DependencyPath:/var/run/host/usr/lib64/rocm/gfx11/lib EnvWorkarounds:[] UnreliableFreeMemory:false ID:0 Name:1002:15bf Compute:gfx1103 DriverMajor:0 DriverMinor:0}" ``` where the free memory seems to be larger than the minimum memory, yet the model is reporting too little memory. --- On a probably unrelated note, ``` time=2024-09-09T10:35:32.855-04:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" ``` It seems like unlike 3.0.3, ollama 0.3.9 is looking for the file `/sys/module/amdgpu/version`, which is not present on my system: ``` > ls /sys/module/amdgpu/ coresize drivers/ holders/ initsize initstate notes/ parameters/ refcnt sections/ taint uevent ``` --- @Jeffser is pushing a flatpak plugin to incorporate rocm: https://github.com/flathub/flathub/pull/5552 , which might improve the current situation of AMD support for ollama in Alpaca. I will probably have more time to do more in-depth testing in a few days.
Author
Owner

@dhiltgen commented on GitHub (Sep 9, 2024):

With version v0.3.10 the log message "gpu has too little memory..." will have more details about the calculations so that will help us root cause these scenarios. My suspicion is you may be setting a large context size.

<!-- gh-comment-id:2338970350 --> @dhiltgen commented on GitHub (Sep 9, 2024): With version v0.3.10 the log message "gpu has too little memory..." will have more details about the calculations so that will help us root cause these scenarios. My suspicion is you may be setting a large context size.
Author
Owner

@igorschlum commented on GitHub (Sep 9, 2024):

@czhang03 you can download a candidate to version 0.3.10 here https://github.com/ollama/ollama/releases

<!-- gh-comment-id:2339046579 --> @igorschlum commented on GitHub (Sep 9, 2024): @czhang03 you can download a candidate to version 0.3.10 here https://github.com/ollama/ollama/releases
Author
Owner

@czhang03 commented on GitHub (Sep 11, 2024):

Hi, I was able to get 0.3.10 running in distrobox. I have tried smollm:135m, everything seems to work fine. But when I move to llama 8b, I got the following error:

time=2024-09-11T14:30:23.328-04:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/var/home/cheng/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-11T14:30:23.328-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="149.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="560.0 MiB"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="149.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="560.0 MiB"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB"
time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
time=2024-09-11T14:30:23.330-04:00 level=INFO source=server.go:101 msg="system memory" total="13.4 GiB" free="6.6 GiB" free_swap="3.9 GiB"
time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]"
time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB"
time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"

I am not sure what each value in the debug logs mean, but I hope these messages are helpful for you.

<!-- gh-comment-id:2344402764 --> @czhang03 commented on GitHub (Sep 11, 2024): Hi, I was able to get 0.3.10 running in distrobox. I have tried smollm:135m, everything seems to work fine. But when I move to llama 8b, I got the following error: ``` time=2024-09-11T14:30:23.328-04:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/var/home/cheng/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-11T14:30:23.328-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="149.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="560.0 MiB" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="149.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="560.0 MiB" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB" time=2024-09-11T14:30:23.329-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" time=2024-09-11T14:30:23.330-04:00 level=INFO source=server.go:101 msg="system memory" total="13.4 GiB" free="6.6 GiB" free_swap="3.9 GiB" time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[633.3 MiB]" time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1103 driver=0.0 name=1002:15bf total="2.0 GiB" available="633.3 MiB" minimum_memory=479199232 layer_size="125.0 MiB" gpu_zer_overhead="0 B" partial_offload="677.5 MiB" full_offload="258.5 MiB" time=2024-09-11T14:30:23.330-04:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" ``` I am not sure what each value in the debug logs mean, but I hope these messages are helpful for you.
Author
Owner

@igorschlum commented on GitHub (Sep 11, 2024):

@czhang03 the message says that you have only 633.3 MiB (mega bytes) and that the LLM needs 2 GiB (giga bytes), so you need at least 3 times more memory of VRAM to run the model.

It seems that you have 1,3 GB of memory. Ollama could load the first layer of 633 MB and then nothing more due to VRAM limitation. It why a tiny LLM works and not llama3.1.

<!-- gh-comment-id:2344852432 --> @igorschlum commented on GitHub (Sep 11, 2024): @czhang03 the message says that you have only 633.3 MiB (mega bytes) and that the LLM needs 2 GiB (giga bytes), so you need at least 3 times more memory of VRAM to run the model. It seems that you have 1,3 GB of memory. Ollama could load the first layer of 633 MB and then nothing more due to VRAM limitation. It why a tiny LLM works and not llama3.1.
Author
Owner

@czhang03 commented on GitHub (Sep 11, 2024):

Thanks for the info. It seems like a igpu is not the best platform for local LLM then.

<!-- gh-comment-id:2344859973 --> @czhang03 commented on GitHub (Sep 11, 2024): Thanks for the info. It seems like a igpu is not the best platform for local LLM then.
Author
Owner

@czhang03 commented on GitHub (Sep 11, 2024):

BTW, I always assumed total="2.0 GiB" means my total VRAM is 2GB, which makes sense, as that is the setting in my uefi configuration. Is my understanding correct?

AFAIK igpu uses system memory as vram, and I have a good amount of them (16gb), is there anyway to let ollama request more memory to used as vram?

<!-- gh-comment-id:2344870729 --> @czhang03 commented on GitHub (Sep 11, 2024): BTW, I always assumed `total="2.0 GiB"` means my total VRAM is 2GB, which makes sense, as that is the setting in my uefi configuration. Is my understanding correct? AFAIK igpu uses system memory as vram, and I have a good amount of them (16gb), is there anyway to let ollama request more memory to used as vram?
Author
Owner

@igorschlum commented on GitHub (Sep 11, 2024):

An IGPU is a graphics processing unit that is integrated directly into the same chip as the CPU (Central Processing Unit). Unlike dedicated GPUs, which are separate from the CPU and have their own dedicated memory, IGPUs share system memory with the CPU

Mac Studio computers that use Apple Silicon chips (such as the M1 Max, M1 Ultra, M2 Max, etc.) have a GPU integrated directly into the chip, similar to an iGPU (Integrated GPU), but it is far more powerful than traditional iGPUs.

In this context, you could technically say they use an integrated GPU, but it is not the typical iGPU (like those found in Intel processors). The GPU in Apple Silicon chips is optimized to deliver high performance, often rivaling dedicated graphics cards (dGPUs) in certain scenarios.

So, for Mac Studio with Apple Silicon, you could say they have an “iGPU,” but with capabilities that far exceed those of a typical iGPU.

<!-- gh-comment-id:2344878083 --> @igorschlum commented on GitHub (Sep 11, 2024): An IGPU is a graphics processing unit that is integrated directly into the same chip as the CPU (Central Processing Unit). Unlike dedicated GPUs, which are separate from the CPU and have their own dedicated memory, IGPUs share system memory with the CPU Mac Studio computers that use Apple Silicon chips (such as the M1 Max, M1 Ultra, M2 Max, etc.) have a GPU integrated directly into the chip, similar to an iGPU (Integrated GPU), but it is far more powerful than traditional iGPUs. In this context, you could technically say they use an integrated GPU, but it is not the typical iGPU (like those found in Intel processors). The GPU in Apple Silicon chips is optimized to deliver high performance, often rivaling dedicated graphics cards (dGPUs) in certain scenarios. So, for Mac Studio with Apple Silicon, you could say they have an “iGPU,” but with capabilities that far exceed those of a typical iGPU.
Author
Owner

@igorschlum commented on GitHub (Sep 11, 2024):

@czhang03 you can ask chatGPT or Phind.com to get help on iGPU.

On PCs, it is often possible to manually allocate more memory to an iGPU (Integrated GPU), depending on the motherboard and BIOS. Here’s how it typically works:

1.	BIOS/UEFI Settings: On many PCs, you can enter the BIOS/UEFI settings during boot-up and manually adjust the amount of memory allocated to the iGPU. This is often found under settings related to Graphics, Integrated Peripherals, or Advanced Chipset Configuration.
2.	Fixed Allocation: You may be able to set a fixed amount of system RAM for the iGPU, such as 512MB, 1GB, or more. However, this reduces the available memory for other tasks since it’s permanently reserved for the iGPU.
3.	Dynamic Allocation: Some systems use dynamic memory allocation, where the iGPU automatically uses more RAM when needed, up to a certain limit, without requiring manual intervention.
4.	Limitations: The maximum amount of RAM you can allocate to an iGPU depends on the total RAM installed and the motherboard’s capabilities. Some lower-end systems may cap the amount of memory the iGPU can use.

In summary, yes, it’s possible to allocate more memory to the iGPU on a PC, but it varies depending on the system’s hardware and BIOS capabilities.

<!-- gh-comment-id:2344884745 --> @igorschlum commented on GitHub (Sep 11, 2024): @czhang03 you can ask chatGPT or Phind.com to get help on iGPU. On PCs, it is often possible to manually allocate more memory to an iGPU (Integrated GPU), depending on the motherboard and BIOS. Here’s how it typically works: 1. BIOS/UEFI Settings: On many PCs, you can enter the BIOS/UEFI settings during boot-up and manually adjust the amount of memory allocated to the iGPU. This is often found under settings related to Graphics, Integrated Peripherals, or Advanced Chipset Configuration. 2. Fixed Allocation: You may be able to set a fixed amount of system RAM for the iGPU, such as 512MB, 1GB, or more. However, this reduces the available memory for other tasks since it’s permanently reserved for the iGPU. 3. Dynamic Allocation: Some systems use dynamic memory allocation, where the iGPU automatically uses more RAM when needed, up to a certain limit, without requiring manual intervention. 4. Limitations: The maximum amount of RAM you can allocate to an iGPU depends on the total RAM installed and the motherboard’s capabilities. Some lower-end systems may cap the amount of memory the iGPU can use. In summary, yes, it’s possible to allocate more memory to the iGPU on a PC, but it varies depending on the system’s hardware and BIOS capabilities.
Author
Owner

@czhang03 commented on GitHub (Sep 11, 2024):

It seems like there is already an issue tracking the UMA support in ollama: https://github.com/ollama/ollama/issues/2637

llama.cpp has a compiler flag that enables dynamic allocation of vram, instead of check and fail approach, which is the default: https://github.com/Mozilla-Ocho/llamafile/discussions/366

<!-- gh-comment-id:2344956712 --> @czhang03 commented on GitHub (Sep 11, 2024): It seems like there is already an issue tracking the UMA support in ollama: https://github.com/ollama/ollama/issues/2637 llama.cpp has a compiler flag that enables dynamic allocation of vram, instead of check and fail approach, which is the default: https://github.com/Mozilla-Ocho/llamafile/discussions/366
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4138