[GH-ISSUE #1590] Add support for Intel Arc GPUs #879

New Issue

https://github.com/ggerganov/llama.cpp/pull/2690

@Leo512bit commented on GitHub (Feb 4, 2024):

It looks like llama.cpp now supports SYCL for Intel GPUs. Is Arc support now possible?

@Leo512bit commented on GitHub (Feb 4, 2024): It looks like llama.cpp now supports SYCL for Intel GPUs. Is Arc support now possible? https://github.com/ggerganov/llama.cpp/pull/2690

GiteaMirror commented

@uxdesignerhector commented on GitHub (Feb 4, 2024):

Last Automatic1111 update 1.7.0 included IPEX and initial support for Intel Arc GPUs on Windows, maybe someone could have a look a see what they have done to make it possible. I know this is for Windows only, but is shows that it is possible to integrate it while on Linux it should be easier as Windows support came later.

I'm aware that maybe WSL is another different beast, I remember having too much trouble installing Automatic1111 and accessing my Intel Arc GPU due to some limitation with the memory and privileges hardcoded into WSL

@uxdesignerhector commented on GitHub (Feb 4, 2024): Last [Automatic1111](https://github.com/AUTOMATIC1111) update [1.7.0](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14171) included [IPEX](https://github.com/intel/intel-extension-for-pytorch) and initial support for Intel Arc GPUs on Windows, maybe someone could have a look a see what they have done to make it possible. I know this is for Windows only, but is shows that it is possible to integrate it while on Linux it should be easier as Windows support came later. I'm aware that maybe [WSL](https://learn.microsoft.com/en-us/windows/wsl/) is another different beast, I remember having too much trouble installing [Automatic1111](https://github.com/AUTOMATIC1111) and accessing my Intel Arc GPU due to some limitation with the memory and privileges hardcoded into [WSL](https://learn.microsoft.com/en-us/windows/wsl/)

GiteaMirror commented

@felipeagc commented on GitHub (Feb 12, 2024):

Hey everyone, I made some progress on adding Intel Arc support to ollama: #2458

@felipeagc commented on GitHub (Feb 12, 2024): Hey everyone, I made some progress on adding Intel Arc support to ollama: #2458

GiteaMirror commented

2026-04-12 10:32:27 -05:00

@ghost commented on GitHub (Feb 13, 2024):

Thank you @felipeagc

@ghost commented on GitHub (Feb 13, 2024): Thank you @felipeagc

GiteaMirror commented

@tannisroot commented on GitHub (Apr 24, 2024):

Support for SYCL/Intel GPUs would be quite interesting because:

Intel offers by far the cheapest 16GB VRAM GPU, A770, costing only $279.99 and packing more than enough performance for inference. RTX 4060 Ti with the same amount of VRAM costs at least $459.99.
Intel also offers the cheapest discrete GPU that is not a hot pile of garbage, the A380.
It is a very popular choice for home servers, since it has very good transcoding compatibility with Jellyfin, and is also supported by Frigate for ML workloads.
With 6GB of VRAM, it should be capable of running competent small models like llama3, which in combination with Home Assistant can be used to power a completely local voice assistant and destroy the likes of Alexa and Google Assistant comprehension wise.
Upcoming Battlemage GPUs might offer even more competitive hardware for inference workloads.

@tannisroot commented on GitHub (Apr 24, 2024): Support for SYCL/Intel GPUs would be quite interesting because: 1) Intel offers by far the cheapest 16GB VRAM GPU, A770, costing only $279.99 and packing more than enough performance for inference. RTX 4060 Ti with the same amount of VRAM costs at least $459.99. 2) Intel also offers the cheapest discrete GPU that is not a hot pile of garbage, the A380. It is a very popular choice for home servers, since it has very good transcoding compatibility with Jellyfin, and is also supported by Frigate for ML workloads. With 6GB of VRAM, it should be capable of running competent small models like llama3, which in combination with Home Assistant can be used to power a completely local voice assistant and destroy the likes of Alexa and Google Assistant comprehension wise. 3) Upcoming Battlemage GPUs might offer even more competitive hardware for inference workloads.

GiteaMirror commented

2026-04-12 10:32:27 -05:00

@Kamryx commented on GitHub (Apr 25, 2024):

Extremely eager to have support for Arc GPUs. Have an A380 idle in my home server ready to be put to use. As the above commenter said, probably the best price/performance GPU for this work load.

I have an ultra layman and loose understanding of all this stuff, but have I correctly surmised that llama.cpp essentially already has Arc support, and it just needs to be implemented/merged into Ollama? And if that’s the case, are we probably in the final stretch?

@Kamryx commented on GitHub (Apr 25, 2024): Extremely eager to have support for Arc GPUs. Have an A380 idle in my home server ready to be put to use. As the above commenter said, probably the best price/performance GPU for this work load. I have an ultra layman and loose understanding of all this stuff, but have I correctly surmised that llama.cpp essentially already has Arc support, and it just needs to be implemented/merged into Ollama? And if that’s the case, are we probably in the final stretch?

GiteaMirror commented

@asknight1980 commented on GitHub (May 5, 2024):

I too have an A380 sitting idle in my R520 anxiously waiting for Ollama to recognize it. Thank you all for the progress you have contributed to this.

@asknight1980 commented on GitHub (May 5, 2024): I too have an A380 sitting idle in my R520 anxiously waiting for Ollama to recognize it. Thank you all for the progress you have contributed to this.

GiteaMirror commented

@kozuch commented on GitHub (Jun 6, 2024):

Is this now done with the merge of https://github.com/ollama/ollama/pull/3278 that has been released in v0.1.140?

@kozuch commented on GitHub (Jun 6, 2024): Is this now done with the merge of https://github.com/ollama/ollama/pull/3278 that has been released in v0.1.140?

GiteaMirror commented

@dhiltgen commented on GitHub (Jun 6, 2024):

@kozuch not quite. It's close.

If you build locally from source, it should work, but we haven't integrated it into our official builds yet.

@dhiltgen commented on GitHub (Jun 6, 2024): @kozuch not quite. It's close. If you build locally from source, it should work, but we haven't integrated it into our official builds yet.

GiteaMirror commented

2026-04-12 10:32:29 -05:00

@uxdesignerhector commented on GitHub (Jun 7, 2024):

@dhiltgen do you know if this will work on WSL or Windows or only Linux?

@uxdesignerhector commented on GitHub (Jun 7, 2024): @dhiltgen do you know if this will work on [WSL](https://learn.microsoft.com/en-us/windows/wsl/) or Windows or only Linux?

GiteaMirror commented

@dhiltgen commented on GitHub (Jun 7, 2024):

The Linux build is already covered in #4876 and my goal is to enable windows as well. This doc implies WSL2 should work.

@dhiltgen commented on GitHub (Jun 7, 2024): The Linux build is already covered in #4876 and my goal is to enable windows as well. [This doc](https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html) implies WSL2 should work.

GiteaMirror commented

2026-04-12 10:32:29 -05:00

@marcoleder commented on GitHub (Jun 11, 2024):

Looking forward to it! Let me know once it is available for Windows :)

@marcoleder commented on GitHub (Jun 11, 2024): Looking forward to it! Let me know once it is available for Windows :)

GiteaMirror commented

@kozuch commented on GitHub (Jun 12, 2024):

@kozuch not quite. It's close.

If you build locally from source, it should work, but we haven't integrated it into our official builds yet.

You are not branching the releases off main? Why was the https://github.com/ollama/ollama/pull/3278 change seen in https://github.com/ollama/ollama/compare/v0.1.39...v0.1.40 changelist then?

@kozuch commented on GitHub (Jun 12, 2024): > @kozuch not quite. It's close. > > If you build locally from source, it should work, but we haven't integrated it into our official builds yet. You are not branching the releases off main? Why was the https://github.com/ollama/ollama/pull/3278 change seen in https://github.com/ollama/ollama/compare/v0.1.39...v0.1.40 changelist then?

GiteaMirror commented

@WeihanLi commented on GitHub (Jun 12, 2024):

Is there a release schedule for this?

@WeihanLi commented on GitHub (Jun 12, 2024): Is there a release schedule for this?

GiteaMirror commented

@asknight1980 commented on GitHub (Jun 14, 2024):

How can I build it to enable Intel Arc?

Install required tools:
go version 1.22 or higher

/builds/ollama-0.1.44/go.mod:3: invalid go version '1.22.0': must match format 1.23
Go 1.23 has been either been pulled back or isn't clearly available.

@asknight1980 commented on GitHub (Jun 14, 2024): How can I build it to enable Intel Arc? > Install required tools: go version 1.22 or higher > /builds/ollama-0.1.44/go.mod:3: invalid go version '1.22.0': must match format 1.23 Go 1.23 has been either been pulled back or isn't clearly available.

GiteaMirror commented

@dhiltgen commented on GitHub (Jun 19, 2024):

Unfortunately users have reported crashing in the Intel GPU management library on some windows systems, so we've had to disable it temporarily until we figure out what's causing the crash. You can re-enable it by setting OLLAMA_INTEL_GPU=1

We don't have docs explaining how to build since it's not reliable yet. You can take a look at the gen_linux.sh and gen_windows.ps1 scripts here for some inspiration on the required tools.

@dhiltgen commented on GitHub (Jun 19, 2024): Unfortunately users have reported crashing in the Intel GPU management library on some windows systems, so we've had to disable it temporarily until we figure out what's causing the crash. You can re-enable it by setting OLLAMA_INTEL_GPU=1 We don't have docs explaining how to build since it's not reliable yet. You can take a look at the gen_linux.sh and gen_windows.ps1 scripts [here](https://github.com/ollama/ollama/tree/main/llm/generate) for some inspiration on the required tools.

GiteaMirror commented

2026-04-12 10:32:31 -05:00

@dhiltgen commented on GitHub (Jun 19, 2024):

Quick update - the crash is fixed on main now, but we'll keep it behind the env var I mentioned above until we get #4876 merged and the resulting binaries validated on linux and windows with Arc GPUs.

@dhiltgen commented on GitHub (Jun 19, 2024): Quick update - the crash is fixed on main now, but we'll keep it behind the env var I mentioned above until we get #4876 merged and the resulting binaries validated on linux and windows with Arc GPUs.

GiteaMirror commented

@ConnorMeng commented on GitHub (Jun 20, 2024):

Sorry if it isn't appropriate to ask this here, but when do you think this will reach the docker image, and when might there be some documentation for that as well?

@ConnorMeng commented on GitHub (Jun 20, 2024): Sorry if it isn't appropriate to ask this here, but when do you think this will reach the docker image, and when might there be some documentation for that as well?

GiteaMirror commented

2026-04-12 10:32:31 -05:00

@YumingChang02 commented on GitHub (Jul 5, 2024):

Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device

"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B"

But it seems that it is not correctly detecting igpu memory size
Note this is what i see using Arc A380

"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB"

I am guessing this is what prevent igpu from working?

@YumingChang02 commented on GitHub (Jul 5, 2024): Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device ``` "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B" ``` But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380 ``` "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB" ``` I am guessing this is what prevent igpu from working?

GiteaMirror commented

2026-04-12 10:32:32 -05:00

@asknight1980 commented on GitHub (Jul 5, 2024):

Are you able to do any inference at all on the Arc A380? I am showing it loading the model in GPU memory on my A380 but the processing is still happening on the CPU while the GPU sits idle.

Jul 05 18:25:13 cyka-b ollama[578885]: 2024/07/05 18:25:13 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_G>
Jul 05 18:25:17 cyka-b ollama[578885]: time=2024-07-05T18:25:17.512-05:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=oneapi compute="" driver=0.0 name="Inte>

NAME ID SIZE PROCESSOR UNTIL
tinyllama:latest 2644915ede35 827 MB 100% GPU 4 minutes from now

@asknight1980 commented on GitHub (Jul 5, 2024): Are you able to do any inference at all on the Arc A380? I am showing it loading the model in GPU memory on my A380 but the processing is still happening on the CPU while the GPU sits idle. Jul 05 18:25:13 cyka-b ollama[578885]: 2024/07/05 18:25:13 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_G> Jul 05 18:25:17 cyka-b ollama[578885]: time=2024-07-05T18:25:17.512-05:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=oneapi compute="" driver=0.0 name="Inte> NAME ID SIZE PROCESSOR UNTIL tinyllama:latest 2644915ede35 827 MB 100% GPU 4 minutes from now

GiteaMirror commented

2026-04-12 10:32:32 -05:00

@MordragT commented on GitHub (Jul 7, 2024):

Is there any way to make ollama find the neo driver's libigdrcl.so library for opencl ? On my setup ollama always returns:

Jul 07 14:56:36 tom-desktop ollama[240788]: found 1 SYCL devices:
Jul 07 14:56:36 tom-desktop ollama[240788]: |  |                   |                                       |       |Max    |        |Max  |Global |                     |
Jul 07 14:56:36 tom-desktop ollama[240788]: |  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
Jul 07 14:56:36 tom-desktop ollama[240788]: |ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
Jul 07 14:56:36 tom-desktop ollama[240788]: |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
Jul 07 14:56:36 tom-desktop ollama[240788]: | 0| [level_zero:gpu:0]|                Intel Arc A750 Graphics|    1.3|    448|    1024|   32|  8096M|            1.3.29735|

And then a bit later:

Jul 07 14:56:36 tom-desktop ollama[240788]: Build program log for 'Intel(R) Arc(TM) A750 Graphics':
Jul 07 14:56:36 tom-desktop ollama[240788]:  -999 (Unknown PI error)Exception caught at file:/build/source/llm/llama.cpp/ggml/src/ggml-sycl.cpp, line:3121

I reproduced the error with llama-cpp and it seems like if llama-cpp can only find the level-zero device and not the opencl one it will throw the exception.

@MordragT commented on GitHub (Jul 7, 2024): Is there any way to make ollama find the neo driver's libigdrcl.so library for opencl ? On my setup ollama always returns: ``` Jul 07 14:56:36 tom-desktop ollama[240788]: found 1 SYCL devices: Jul 07 14:56:36 tom-desktop ollama[240788]: | | | | |Max | |Max |Global | | Jul 07 14:56:36 tom-desktop ollama[240788]: | | | | |compute|Max work|sub |mem | | Jul 07 14:56:36 tom-desktop ollama[240788]: |ID| Device Type| Name|Version|units |group |group|size | Driver version| Jul 07 14:56:36 tom-desktop ollama[240788]: |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------| Jul 07 14:56:36 tom-desktop ollama[240788]: | 0| [level_zero:gpu:0]| Intel Arc A750 Graphics| 1.3| 448| 1024| 32| 8096M| 1.3.29735| ``` And then a bit later: ``` Jul 07 14:56:36 tom-desktop ollama[240788]: Build program log for 'Intel(R) Arc(TM) A750 Graphics': Jul 07 14:56:36 tom-desktop ollama[240788]: -999 (Unknown PI error)Exception caught at file:/build/source/llm/llama.cpp/ggml/src/ggml-sycl.cpp, line:3121 ``` I reproduced the error with llama-cpp and it seems like if llama-cpp can only find the level-zero device and not the opencl one it will throw the exception.

GiteaMirror commented

2026-04-12 10:32:32 -05:00

@Yueming-Yan commented on GitHub (Jul 11, 2024):

Looking forward :)

Intel(R) Iris(R) Xe Graphics

time=2024-07-11T12:02:14.704+08:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-11T12:02:15.136+08:00 level=INFO source=gpu.go:324 msg="no compatible GPUs were discovered"

Append some useful links:
https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md
https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

@Yueming-Yan commented on GitHub (Jul 11, 2024): Looking forward :) Intel(R) Iris(R) Xe Graphics ``` time=2024-07-11T12:02:14.704+08:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" time=2024-07-11T12:02:15.136+08:00 level=INFO source=gpu.go:324 msg="no compatible GPUs were discovered" ``` Append some useful links: https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

GiteaMirror commented

2026-04-12 10:32:33 -05:00

@TheSpaceGod commented on GitHub (Jul 18, 2024):

Out of curiosity, what is holding up this PR (https://github.com/ollama/ollama/pull/4876) making it to main? It looks like its passing all the relevant PR tests.
I think this would be a real game changer for all the people running small LLM models via docker on Intel NUC style computers like myself.

@TheSpaceGod commented on GitHub (Jul 18, 2024): Out of curiosity, what is holding up this PR (https://github.com/ollama/ollama/pull/4876) making it to main? It looks like its passing all the relevant PR tests. I think this would be a real game changer for all the people running small LLM models via docker on Intel NUC style computers like myself.

GiteaMirror commented

2026-04-12 10:32:33 -05:00

@tannisroot commented on GitHub (Jul 18, 2024):

Out of curiosity, what is holding up this PR (https://github.com/ollama/ollama/pull/4876) making it to main? It looks like its passing all the relevant PR tests.
I think this would be a real game changer for all the people running small LLM models via docker on Intel NUC style computers like myself.

The Windows driver for Intel is crashing with Ollama.
Honestly as a Linux user it's a little bit annoying, I imagine majority of people who want to use Ollama with Intel GPU plan to do so in their Linux box.
It's also not guranteed Intel will fix it any time soon, I remember another open source project DXVK encountered major crashing bugs exclusive to the Windows Intel driver, and it took years for things to get fixed afaik (if they are even fully fixed).

@tannisroot commented on GitHub (Jul 18, 2024): > Out of curiosity, what is holding up this PR (https://github.com/ollama/ollama/pull/4876) making it to main? It looks like its passing all the relevant PR tests. > I think this would be a real game changer for all the people running small LLM models via docker on Intel NUC style computers like myself. The Windows driver for Intel is crashing with Ollama. Honestly as a Linux user it's a little bit annoying, I imagine majority of people who want to use Ollama with Intel GPU plan to do so in their Linux box. It's also not guranteed Intel will fix it any time soon, I remember another open source project DXVK encountered major crashing bugs exclusive to the Windows Intel driver, and it took years for things to get fixed afaik (if they are even fully fixed).

GiteaMirror commented

2026-04-12 10:32:33 -05:00

@lirc571 commented on GitHub (Jul 18, 2024):

Some works are being done at #5593 and on llama.cpp side by Intel people. Looks like they are actively working on it!

@lirc571 commented on GitHub (Jul 18, 2024): Some works are being done at #5593 and on llama.cpp side by Intel people. Looks like they are actively working on it!

GiteaMirror commented

@tannisroot commented on GitHub (Jul 19, 2024):

Some works are being done at #5593 and on llama.cpp side by Intel people. Looks like they are actively working on it!

Oh then that is very good news!

@tannisroot commented on GitHub (Jul 19, 2024): > Some works are being done at #5593 and on llama.cpp side by Intel people. Looks like they are actively working on it! Oh then that is very good news!

GiteaMirror commented

@MarkWard0110 commented on GitHub (Jul 22, 2024):

Does this include support for the integrated GPU? For example, the Intel Core i9 14900k has an integrated GPU. When I enable the feature on Ubuntu Server 22.04 it crashes. OLLAMA_INTEL_GPU=1

I am curious to know if there are dependencies to have installed for this to work.

Jul 22 15:27:34 quorra systemd[1]: Started Ollama Service.
Jul 22 15:27:34 quorra ollama[3678911]: 2024/07/22 15:27:34 routes.go:1096: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.349Z level=INFO source=images.go:778 msg="total blobs: 81"
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=images.go:785 msg="total unused blobs removed: 0"
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=routes.go:1143 msg="Listening on [::]:11434 (version 0.2.7)"
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2597150250/runners
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu_avx/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu_avx2/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cuda_v11/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/rocm_v60102/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60102 cpu cpu_avx cpu_avx2]"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=sched.go:102 msg="starting llm scheduler"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06]
Jul 22 15:27:35 quorra ollama[3678911]: CUDA driver version: 12.5
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.903Z level=DEBUG source=gpu.go:124 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06
Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] CUDA totalMem 15981 mb
Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] CUDA freeMem 15763 mb
Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] Compute Capability 8.9
Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libze_intel_gpu.so
Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libze_intel_gpu.so* /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so* /usr/lib*/libze_intel_gpu.so*]"
Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[]
Jul 22 15:27:36 quorra ollama[3678911]: releasing cuda driver library
Jul 22 15:27:36 quorra ollama[3678911]: panic: runtime error: invalid memory address or nil pointer dereference
Jul 22 15:27:36 quorra ollama[3678911]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xc pc=0x832ad7]
Jul 22 15:27:36 quorra ollama[3678911]: goroutine 1 [running]:
Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/gpu.GetGPUInfo()
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/ollama/ollama/gpu/gpu.go:313 +0xdf7
Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/server.Serve({0x1de902f8, 0xc000709b00})
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/ollama/ollama/server/routes.go:1176 +0x7a5
Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/cmd.RunServer(0xc00004cd00?, {0x1e723860?, 0x4?, 0x12a4ec5?})
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/ollama/ollama/cmd/cmd.go:1084 +0xfa
Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).execute(0xc000174308, {0x1e723860, 0x0, 0x0})
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/spf13/cobra@v1.7.0/command.go:940 +0x882
Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000123508)
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5
Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).Execute(...)
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/spf13/cobra@v1.7.0/command.go:992
Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).ExecuteContext(...)
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/spf13/cobra@v1.7.0/command.go:985
Jul 22 15:27:36 quorra ollama[3678911]: main.main()
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/ollama/ollama/main.go:11 +0x4d
Jul 22 15:27:36 quorra systemd[1]: ollama.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jul 22 15:27:36 quorra systemd[1]: ollama.service: Failed with result 'exit-code'.
Jul 22 15:27:36 quorra systemd[1]: ollama.service: Consumed 4.799s CPU time.
Jul 22 15:27:39 quorra systemd[1]: ollama.service: Scheduled restart job, restart counter is at 29.
Jul 22 15:27:39 quorra systemd[1]: Stopped Ollama Service.
Jul 22 15:27:39 quorra systemd[1]: ollama.service: Consumed 4.799s CPU time.

@MarkWard0110 commented on GitHub (Jul 22, 2024): Does this include support for the integrated GPU? For example, the Intel Core i9 14900k has an integrated GPU. When I enable the feature on Ubuntu Server 22.04 it crashes. `OLLAMA_INTEL_GPU=1` I am curious to know if there are dependencies to have installed for this to work. ``` Jul 22 15:27:34 quorra systemd[1]: Started Ollama Service. Jul 22 15:27:34 quorra ollama[3678911]: 2024/07/22 15:27:34 routes.go:1096: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.349Z level=INFO source=images.go:778 msg="total blobs: 81" Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=images.go:785 msg="total unused blobs removed: 0" Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=routes.go:1143 msg="Listening on [::]:11434 (version 0.2.7)" Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2597150250/runners Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu_avx/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu_avx2/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cuda_v11/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/rocm_v60102/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60102 cpu cpu_avx cpu_avx2]" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=sched.go:102 msg="starting llm scheduler" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06] Jul 22 15:27:35 quorra ollama[3678911]: CUDA driver version: 12.5 Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.903Z level=DEBUG source=gpu.go:124 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06 Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] CUDA totalMem 15981 mb Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] CUDA freeMem 15763 mb Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] Compute Capability 8.9 Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libze_intel_gpu.so Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libze_intel_gpu.so* /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so* /usr/lib*/libze_intel_gpu.so*]" Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[] Jul 22 15:27:36 quorra ollama[3678911]: releasing cuda driver library Jul 22 15:27:36 quorra ollama[3678911]: panic: runtime error: invalid memory address or nil pointer dereference Jul 22 15:27:36 quorra ollama[3678911]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xc pc=0x832ad7] Jul 22 15:27:36 quorra ollama[3678911]: goroutine 1 [running]: Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/gpu.GetGPUInfo() Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/gpu/gpu.go:313 +0xdf7 Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/server.Serve({0x1de902f8, 0xc000709b00}) Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/server/routes.go:1176 +0x7a5 Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/cmd.RunServer(0xc00004cd00?, {0x1e723860?, 0x4?, 0x12a4ec5?}) Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/cmd/cmd.go:1084 +0xfa Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).execute(0xc000174308, {0x1e723860, 0x0, 0x0}) Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra@v1.7.0/command.go:940 +0x882 Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000123508) Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).Execute(...) Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra@v1.7.0/command.go:992 Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).ExecuteContext(...) Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra@v1.7.0/command.go:985 Jul 22 15:27:36 quorra ollama[3678911]: main.main() Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/main.go:11 +0x4d Jul 22 15:27:36 quorra systemd[1]: ollama.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Jul 22 15:27:36 quorra systemd[1]: ollama.service: Failed with result 'exit-code'. Jul 22 15:27:36 quorra systemd[1]: ollama.service: Consumed 4.799s CPU time. Jul 22 15:27:39 quorra systemd[1]: ollama.service: Scheduled restart job, restart counter is at 29. Jul 22 15:27:39 quorra systemd[1]: Stopped Ollama Service. Jul 22 15:27:39 quorra systemd[1]: ollama.service: Consumed 4.799s CPU time. ```

GiteaMirror commented

@TheSpaceGod commented on GitHub (Aug 7, 2024):

Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs?
https://github.com/ollama/ollama/issues/2033

@TheSpaceGod commented on GitHub (Aug 7, 2024): Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? https://github.com/ollama/ollama/issues/2033

GiteaMirror commented

2026-04-12 10:32:35 -05:00

@asknight1980 commented on GitHub (Aug 9, 2024):

Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? #2033

It seems as if IPEX is the only way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier.

@asknight1980 commented on GitHub (Aug 9, 2024): > Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? #2033 It seems as if IPEX is the *only* way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: [https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html](url) Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier.

GiteaMirror commented

@TheSpaceGod commented on GitHub (Aug 9, 2024):

It seems as if IPEX is the only way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier.

Based on this chart, most all the Intel GPUs/iGPUs should support Vulkan 1.2+: https://www.intel.com/content/www/us/en/support/articles/000005524/graphics.html

I am really hoping this is an easier implementation route, because I agree, IPEX seems pretty hard to use in its current form. Even if Vulkan is slower than IPEX, some Intel GPU support will be better than nothing.

@TheSpaceGod commented on GitHub (Aug 9, 2024): > It seems as if IPEX is the _only_ way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: [https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html](url) Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier. Based on this chart, most all the Intel GPUs/iGPUs should support Vulkan 1.2+: https://www.intel.com/content/www/us/en/support/articles/000005524/graphics.html I am really hoping this is an easier implementation route, because I agree, IPEX seems pretty hard to use in its current form. Even if Vulkan is slower than IPEX, some Intel GPU support will be better than nothing.

GiteaMirror commented

2026-04-12 10:32:35 -05:00

@celesrenata commented on GitHub (Aug 13, 2024):

Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device
"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B"
But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380
"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB"
I am guessing this is what prevent igpu from working?

I am also running into the 0 memory available IGPU issue, borrowing configs from @MordragT in NixOS on Intel 185H chips.

@celesrenata commented on GitHub (Aug 13, 2024): > Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device > > ``` > "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B" > ``` > > But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380 > > ``` > "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB" > ``` > > I am guessing this is what prevent igpu from working? I am also running into the 0 memory available IGPU issue, borrowing configs from @MordragT in NixOS on Intel 185H chips.

GiteaMirror commented

2026-04-12 10:32:35 -05:00

@slyoldfox commented on GitHub (Aug 14, 2024):

Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device
"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B"
But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380
"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB"
I am guessing this is what prevent igpu from working?

I am also running into the 0 memory available IGPU issue, borrowing configs from @MordragT in NixOS on Intel 185H chips.

I was seeing exactly the same issue

@slyoldfox commented on GitHub (Aug 14, 2024): > > Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device > > > > > > ``` > > > "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B" > > > ``` > > > > > > But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380 > > > > > > ``` > > > "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB" > > > ``` > > > > > > I am guessing this is what prevent igpu from working? > > > > I am also running into the 0 memory available IGPU issue, borrowing configs from @MordragT in NixOS on Intel 185H chips. I was seeing exactly the same issue

GiteaMirror commented

@celesrenata commented on GitHub (Aug 19, 2024):

I am trying another route, I have build SR-IOV support for my ARC iGPU, and tested it successfully in Kube with plex. Once RAM arrives today, I will attempt to see if I can run OneAPI/IPEX-LLM from kubevirts to give to Ollama. My attempt yesterday showed that it offloaded to CPU, but I had no RAM left. I'll try to update this thread if I have any success.

@celesrenata commented on GitHub (Aug 19, 2024): I am trying another route, I have build SR-IOV support for my ARC iGPU, and tested it successfully in Kube with plex. Once RAM arrives today, I will attempt to see if I can run OneAPI/IPEX-LLM from kubevirts to give to Ollama. My attempt yesterday showed that it offloaded to CPU, but I had no RAM left. I'll try to update this thread if I have any success.

GiteaMirror commented

@sambartik commented on GitHub (Aug 21, 2024):

Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? #2033

It seems as if IPEX is the only way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier.

Hi, I am on Ubuntu 24.04 LTS and got it working by using their container image intelanalytics/ipex-llm-inference-cpp-xpu:latest, more info here.

The only caveat was that the container was not able to detect my GPU. Digging deeper, I found that my kernel that came with the ubuntu - 6.8.0-40-generic was causing the issue. The workaround until it gets fixed was to set these environmental variables:

      - NEOReadDebugKeys=1
      - OverrideGpuAddressSpace=48

After that I got my GPU detected. Also, because it is docker, I don't need to worry about service startup as it is handled by docker.

@sambartik commented on GitHub (Aug 21, 2024): > > Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? #2033 > > It seems as if IPEX is the _only_ way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: [https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html](url) Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier. Hi, I am on Ubuntu 24.04 LTS and got it working by using their container image `intelanalytics/ipex-llm-inference-cpp-xpu:latest`, more info [here](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/docker_cpp_xpu_quickstart.md). The only caveat was that the container was not able to detect my GPU. Digging deeper, I found that my kernel that came with the ubuntu - 6.8.0-40-generic was causing the issue. The workaround until it gets fixed was to set these environmental variables: ``` - NEOReadDebugKeys=1 - OverrideGpuAddressSpace=48 ``` After that I got my GPU detected. Also, because it is docker, I don't need to worry about service startup as it is handled by docker.

GiteaMirror commented

@Xyz00777 commented on GitHub (Sep 3, 2024):

im a little bit confused atm, what is the state now atm? Because I have ollama running natively on my linux debian vm with a pass throught Arc 770 without container. I have the environment option for intel GPUs enabled and I can see it with lspci

# lspci
00:10.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08)

but when i start the ollama service it says that it cant find any GPU

Sep 03 21:20:23 ollama ollama[988]: 2024/09/03 21:20:23 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QU>
Sep 03 21:20:23 ollama ollama[988]: time=2024-09-03T21:20:23.996+02:00 level=INFO source=images.go:753 msg="total blobs: 4"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.010+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.015+02:00 level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9)"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.017+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama817142681/runners
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="15.5 GiB" available="14.5 GiB"
Sep 03 21:20:44 ollama ollama[988]: [GIN] 2024/09/03 - 21:20:44 | 200 |    7.026253ms |      172.17.0.2 | GET      "/api/tags"
Sep 03 21:21:09 ollama ollama[988]: time=2024-09-03T21:21:09.653+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[14.1 GiB]" memory.required.full="9.2 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[9.2 GiB]" memory.weights.total=">

@Xyz00777 commented on GitHub (Sep 3, 2024): im a little bit confused atm, what is the state now atm? Because I have ollama running natively on my linux debian vm with a pass throught Arc 770 without container. I have the environment option for intel GPUs enabled and I can see it with lspci ``` # lspci 00:10.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) ``` but when i start the ollama service it says that it cant find any GPU ``` Sep 03 21:20:23 ollama ollama[988]: 2024/09/03 21:20:23 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QU> Sep 03 21:20:23 ollama ollama[988]: time=2024-09-03T21:20:23.996+02:00 level=INFO source=images.go:753 msg="total blobs: 4" Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.010+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0" Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.015+02:00 level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9)" Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.017+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama817142681/runners Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]" Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs" Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="15.5 GiB" available="14.5 GiB" Sep 03 21:20:44 ollama ollama[988]: [GIN] 2024/09/03 - 21:20:44 | 200 | 7.026253ms | 172.17.0.2 | GET "/api/tags" Sep 03 21:21:09 ollama ollama[988]: time=2024-09-03T21:21:09.653+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[14.1 GiB]" memory.required.full="9.2 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[9.2 GiB]" memory.weights.total="> ```

GiteaMirror commented

2026-04-12 10:32:37 -05:00

@tannisroot commented on GitHub (Sep 3, 2024):

im a little bit confused atm, what is the state now atm? Because I have ollama running natively on my linux debian vm with a pass throught Arc 770 without container. I have the environment option for intel GPUs enabled and I can see it with lspci

# lspci
00:10.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08)

but when i start the ollama service it says that it cant find any GPU

Sep 03 21:20:23 ollama ollama[988]: 2024/09/03 21:20:23 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QU>
Sep 03 21:20:23 ollama ollama[988]: time=2024-09-03T21:20:23.996+02:00 level=INFO source=images.go:753 msg="total blobs: 4"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.010+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.015+02:00 level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9)"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.017+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama817142681/runners
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="15.5 GiB" available="14.5 GiB"
Sep 03 21:20:44 ollama ollama[988]: [GIN] 2024/09/03 - 21:20:44 | 200 |    7.026253ms |      172.17.0.2 | GET      "/api/tags"
Sep 03 21:21:09 ollama ollama[988]: time=2024-09-03T21:21:09.653+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[14.1 GiB]" memory.required.full="9.2 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[9.2 GiB]" memory.weights.total=">

How did you compile Ollama?

@tannisroot commented on GitHub (Sep 3, 2024): > im a little bit confused atm, what is the state now atm? Because I have ollama running natively on my linux debian vm with a pass throught Arc 770 without container. I have the environment option for intel GPUs enabled and I can see it with lspci > > ``` > # lspci > 00:10.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) > ``` > > but when i start the ollama service it says that it cant find any GPU > > ``` > Sep 03 21:20:23 ollama ollama[988]: 2024/09/03 21:20:23 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QU> > Sep 03 21:20:23 ollama ollama[988]: time=2024-09-03T21:20:23.996+02:00 level=INFO source=images.go:753 msg="total blobs: 4" > Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.010+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0" > Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.015+02:00 level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9)" > Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.017+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama817142681/runners > Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]" > Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs" > Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" > Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="15.5 GiB" available="14.5 GiB" > Sep 03 21:20:44 ollama ollama[988]: [GIN] 2024/09/03 - 21:20:44 | 200 | 7.026253ms | 172.17.0.2 | GET "/api/tags" > Sep 03 21:21:09 ollama ollama[988]: time=2024-09-03T21:21:09.653+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[14.1 GiB]" memory.required.full="9.2 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[9.2 GiB]" memory.weights.total="> > ``` How did you compile Ollama?

GiteaMirror commented

@Xyz00777 commented on GitHub (Sep 3, 2024):

i didnt compiled it, i just downloaded it with ansible, thats what i had done:

- name: Download and extract Ollama package
  hosts: localhost
  become: yes
  tasks:
    - name: Download Ollama tarball
      get_url:
        url: https://ollama.com/download/ollama-linux-amd64.tgz
        dest: /tmp/ollama-linux-amd64.tgz
        mode: '0644'

    - name: Extract Ollama tarball to /usr
      unarchive:
        src: /tmp/ollama-linux-amd64.tgz
        dest: /usr
        remote_src: yes
        extra_opts: [--strip-components=1]
        creates: /usr/ollama  # This prevents re-extraction if the target already exists

    - name: Create Ollama user
      user:
        name: ollama
        system: yes
        shell: /bin/false
        home: /usr/share/ollama
        create_home: yes

    - name: Create systemd service file for Ollama
      copy:
        dest: /etc/systemd/system/ollama.service
        content: |
          [Unit]
          Description=Ollama Service
          After=network-online.target

          [Service]
          Environment="HOME=/mnt/ollama/llms"
          Environment="OLLAMA_INTEL_GPU=1"
          Environment="OLLAMA_HOST=0.0.0.0"
          ExecStart=/usr/bin/ollama serve
          User=ollama
          Group=ollama
          Restart=always
          RestartSec=3

          [Install]
          WantedBy=default.target
        owner: root
        group: root
        mode: '0644'
      notify: reload systemd

    - name: Enable and start Ollama service
      systemd:
        name: ollama
        enabled: yes
        state: started

@Xyz00777 commented on GitHub (Sep 3, 2024): i didnt compiled it, i just downloaded it with ansible, thats what i had done: ``` - name: Download and extract Ollama package hosts: localhost become: yes tasks: - name: Download Ollama tarball get_url: url: https://ollama.com/download/ollama-linux-amd64.tgz dest: /tmp/ollama-linux-amd64.tgz mode: '0644' - name: Extract Ollama tarball to /usr unarchive: src: /tmp/ollama-linux-amd64.tgz dest: /usr remote_src: yes extra_opts: [--strip-components=1] creates: /usr/ollama # This prevents re-extraction if the target already exists - name: Create Ollama user user: name: ollama system: yes shell: /bin/false home: /usr/share/ollama create_home: yes - name: Create systemd service file for Ollama copy: dest: /etc/systemd/system/ollama.service content: | [Unit] Description=Ollama Service After=network-online.target [Service] Environment="HOME=/mnt/ollama/llms" Environment="OLLAMA_INTEL_GPU=1" Environment="OLLAMA_HOST=0.0.0.0" ExecStart=/usr/bin/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 [Install] WantedBy=default.target owner: root group: root mode: '0644' notify: reload systemd - name: Enable and start Ollama service systemd: name: ollama enabled: yes state: started ```

GiteaMirror commented

2026-04-12 10:32:37 -05:00

@tannisroot commented on GitHub (Sep 3, 2024):

Release version of Ollama is not compiled with OneAPI (Intel) support.
You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled

@tannisroot commented on GitHub (Sep 3, 2024): Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled

GiteaMirror commented

2026-04-12 10:32:37 -05:00

@Xyz00777 commented on GitHub (Sep 3, 2024):

Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled

so i have to wait until at least the intel compatibel linux package is updated to can be downloaded or i have to compile it and have to have installed packages you mentioned? or do i even than need these packages if the intel compatible package for download got releases (sorry im a beginner in these area and have no experience)

@Xyz00777 commented on GitHub (Sep 3, 2024): > Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled so i have to wait until at least the intel compatibel linux package is updated to can be downloaded or i have to compile it and have to have installed packages you mentioned? or do i even than need these packages if the intel compatible package for download got releases (sorry im a beginner in these area and have no experience)

GiteaMirror commented

2026-04-12 10:32:38 -05:00

@tannisroot commented on GitHub (Sep 3, 2024):

Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled

so i have to wait until at least the intel compatibel linux package is updated to can be downloaded or i have to compile it and have to have installed packages you mentioned? or do i even than need these packages if the intel compatible package for download got releases (sorry im a beginner in these area and have no experience)

I believe Ollama does plan to provide an Intel supporting package at some point in the near future.
Meanwhile you can try building on your own. If you need help with that, @ me on the official Ollama discord, I'll be glad to assist you during european day hours!

@tannisroot commented on GitHub (Sep 3, 2024): > > Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled > > so i have to wait until at least the intel compatibel linux package is updated to can be downloaded or i have to compile it and have to have installed packages you mentioned? or do i even than need these packages if the intel compatible package for download got releases (sorry im a beginner in these area and have no experience) I believe Ollama does plan to provide an Intel supporting package at some point in the near future. Meanwhile you can try building on your own. If you need help with that, @ me on the official Ollama discord, I'll be glad to assist you during european day hours!

GiteaMirror commented

2026-04-12 10:32:38 -05:00

@xiangyang-95 commented on GitHub (Oct 4, 2024):

It would be great if we could download, extract, and run Ollama on an Intel GPU directly. The example would be like

curl -L https://ollama.com/download/ollama-linux-amd64-sycl.tgz -o ollama-linux-amd64-sycl.tgz
sudo tar -C /usr -xzf ollama-linux-amd64-sycl.tgz

I am willing to contribute this feature if needed.

@xiangyang-95 commented on GitHub (Oct 4, 2024): It would be great if we could download, extract, and run Ollama on an Intel GPU directly. The example would be like ``` curl -L https://ollama.com/download/ollama-linux-amd64-sycl.tgz -o ollama-linux-amd64-sycl.tgz sudo tar -C /usr -xzf ollama-linux-amd64-sycl.tgz ``` I am willing to contribute this feature if needed.

GiteaMirror commented

2026-04-12 10:32:39 -05:00

@semidark commented on GitHub (Oct 10, 2024):

With the help of @tannisroot, I successfully compiled Ollama with Intel GPU support from source.

The process was quite straightforward, and everything went smoothly. I had high hopes since I've been running llama.cpp standalone with my iGPU for the past few weeks. However, when I ran Ollama, it detected my iGPU, but the integrated llama.cpp server did not use it.

I suspect this is related to Ollama's handling of the unified memory on the iGPU, as mentioned by @dhiltgen in this comment .

Here is some output where Ollama reports that the memory size is 0 Bytes:

time=2024-10-10T22:44:15.930+02:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-10-10T22:44:15.948+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Iris(R) Xe Graphics" total="0 B" available="0 B"

To investigate, I ran the ollama_llama_server directly without using Ollama, and it seemed to recognize my iGPU and Unified RAM as expected:

cd llm/build/linux/amd64/oneapi/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.
./ollama_llama_server -m ~/src/llama.cpp/models/gemma-2-2b-it-Q8_0.gguf -c 4000 -ngl 28 --host 127.0.0.1 --port 3000

[...]
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
llm_load_tensors: ggml ctx size =    0.26 MiB
llm_load_tensors: offloading 26 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 27/27 layers to GPU
llm_load_tensors:      SYCL0 buffer size =  2649.78 MiB
llm_load_tensors:        CPU buffer size =   597.66 MiB
[...]

So, how can I get Ollama to recognize the Unified Memory on my iGPU? Could we consider a quick fix to the GPU identification code, perhaps forcing Ollama to work with Unified Memory when the ZES_ENABLE_SYSMAN=1 environment variable is set?

@semidark commented on GitHub (Oct 10, 2024): With the help of @tannisroot, I successfully compiled Ollama with Intel GPU support from source. The process was quite straightforward, and everything went smoothly. I had high hopes since I've been running llama.cpp standalone with my iGPU for the past few weeks. However, when I ran Ollama, it detected my iGPU, but the integrated llama.cpp server did not use it. I suspect this is related to Ollama's handling of the unified memory on the iGPU, as mentioned by @dhiltgen in [this comment](https://github.com/ollama/ollama/issues/5387#issuecomment-2204423270) . Here is some output where Ollama reports that the memory size is 0 Bytes: ``` time=2024-10-10T22:44:15.930+02:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs" time=2024-10-10T22:44:15.948+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Iris(R) Xe Graphics" total="0 B" available="0 B" ``` To investigate, I ran the `ollama_llama_server` directly without using Ollama, and it seemed to recognize my iGPU and Unified RAM as expected: ``` cd llm/build/linux/amd64/oneapi/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:. ./ollama_llama_server -m ~/src/llama.cpp/models/gemma-2-2b-it-Q8_0.gguf -c 4000 -ngl 28 --host 127.0.0.1 --port 3000 ``` ``` [...] ggml_sycl_init: SYCL_USE_XMX: yes ggml_sycl_init: found 1 SYCL devices: get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory llm_load_tensors: ggml ctx size = 0.26 MiB llm_load_tensors: offloading 26 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 27/27 layers to GPU llm_load_tensors: SYCL0 buffer size = 2649.78 MiB llm_load_tensors: CPU buffer size = 597.66 MiB [...] ``` So, how can I get Ollama to recognize the Unified Memory on my iGPU? Could we consider a quick fix to the GPU identification code, perhaps forcing Ollama to work with Unified Memory when the `ZES_ENABLE_SYSMAN=1` environment variable is set?

GiteaMirror commented

2026-04-12 10:32:39 -05:00

@Gunnarr970 commented on GitHub (Oct 11, 2024):

Here is an ipex-llm beta. It allows ollama to work on very old "HD Graphics 630" using SYCL.

@Gunnarr970 commented on GitHub (Oct 11, 2024): Here is an [ipex-llm beta](https://github.com/intel-analytics/ipex-llm/issues/12120#issuecomment-2403947706). It allows ollama to work on very old "HD Graphics 630" using SYCL.

GiteaMirror commented

2026-04-12 10:32:39 -05:00

@celesrenata commented on GitHub (Oct 13, 2024):

I am trying another route, I have build SR-IOV support for my ARC iGPU, and tested it successfully in Kube with plex. Once RAM arrives today, I will attempt to see if I can run OneAPI/IPEX-LLM from kubevirts to give to Ollama. My attempt yesterday showed that it offloaded to CPU, but I had no RAM left. I'll try to update this thread if I have any success.

I did in the end have success with my little project.
https://github.com/celesrenata/nixos-k3s-configs
specifically with Ubuntu Kubevirts. So if you want to borrow from my work, I suggest looking into: https://github.com/celesrenata/nixos-k3s-configs/blob/main/kubevirt/ipex-1x/bootstrap-ipex-fleet.sh works with Ubuntu 24.04 LTS

@celesrenata commented on GitHub (Oct 13, 2024): > I am trying another route, I have build SR-IOV support for my ARC iGPU, and tested it successfully in Kube with plex. Once RAM arrives today, I will attempt to see if I can run OneAPI/IPEX-LLM from kubevirts to give to Ollama. My attempt yesterday showed that it offloaded to CPU, but I had no RAM left. I'll try to update this thread if I have any success. I did in the end have success with my little project. https://github.com/celesrenata/nixos-k3s-configs specifically with Ubuntu Kubevirts. So if you want to borrow from my work, I suggest looking into: https://github.com/celesrenata/nixos-k3s-configs/blob/main/kubevirt/ipex-1x/bootstrap-ipex-fleet.sh works with Ubuntu 24.04 LTS

GiteaMirror commented

@WoutvanderAa commented on GitHub (Nov 6, 2024):

do the arc cards already work? I have a intel arc a380 in my unraid server atm and I would love to use it for ollama.

@WoutvanderAa commented on GitHub (Nov 6, 2024): do the arc cards already work? I have a intel arc a380 in my unraid server atm and I would love to use it for ollama.

GiteaMirror commented

@yurhett commented on GitHub (Nov 10, 2024):

Hi @dhiltgen,

Thank you for your hard work and dedication to improving ollama. I've reviewed the changes introduced in the 0.4 update and noticed that a significant portion of the codebase has been restructured, and the build system has transitioned to using make. Consequently, support for Intel GPUs has been excluded in this update.

However, it's worth noting that upstream llama.cpp has now officially added support for Intel GPUs. Considering this development, I would like to inquire if there are plans to integrate Intel GPU support into future releases of ollama.

Thank you for your time and consideration.

@yurhett commented on GitHub (Nov 10, 2024): Hi @dhiltgen, Thank you for your hard work and dedication to improving ollama. I've reviewed the changes introduced in the 0.4 update and noticed that a significant portion of the codebase has been restructured, and the build system has transitioned to using make. Consequently, support for Intel GPUs has been excluded in this update. However, it's worth noting that upstream llama.cpp has now officially added support for Intel GPUs. Considering this development, I would like to inquire if there are plans to integrate Intel GPU support into future releases of ollama. Thank you for your time and consideration.

GiteaMirror commented

@pepijndevos commented on GitHub (Nov 10, 2024):

It seems indeed 0.4 just does not build Intel Arc support using the method suggested above. Is there another method?

For now it seems git checkout v0.3.14 will get you... somewhere, but currently still playing whack-a-mole with compiler errors.

The reason I'm trying to build from source is that the ipex-llm bundled version appears broken
https://github.com/intel-analytics/ipex-llm/issues/12374

Update: I built from source, result:

Abort was called at 1078 line in file:
/usr/src/debug/intel-compute-runtime/compute-runtime-24.39.31294.12/shared/source/os_interface/linux/drm_neo.cpp

@pepijndevos commented on GitHub (Nov 10, 2024): It seems indeed 0.4 just does not build Intel Arc support using the method suggested above. Is there another method? For now it seems `git checkout v0.3.14` will get you... somewhere, but currently still playing whack-a-mole with compiler errors. The reason I'm trying to build from source is that the ipex-llm bundled version appears broken https://github.com/intel-analytics/ipex-llm/issues/12374 Update: I built from source, result: ``` Abort was called at 1078 line in file: /usr/src/debug/intel-compute-runtime/compute-runtime-24.39.31294.12/shared/source/os_interface/linux/drm_neo.cpp ```

GiteaMirror commented