[GH-ISSUE #1590] Add support for Intel Arc GPUs #26640

Open
opened 2026-04-22 03:02:19 -05:00 by GiteaMirror · 72 comments
Owner

Originally created by @taep96 on GitHub (Dec 18, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1590

Originally assigned to: @dhiltgen on GitHub.

Originally created by @taep96 on GitHub (Dec 18, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1590 Originally assigned to: @dhiltgen on GitHub.
GiteaMirror added the intelfeature request labels 2026-04-22 03:02:20 -05:00
Author
Owner

@6543 commented on GitHub (Dec 19, 2023):

also looking forward ;)

PS: the intel IPEX looks not commonly supported - would be nice throug

so fallback option would be to use the vulkan api as target 🤔

<!-- gh-comment-id:1862233849 --> @6543 commented on GitHub (Dec 19, 2023): also looking forward ;) PS: the intel [IPEX](http://blog.nuullll.com/ipex-sd-docker-for-arc-gpu) looks not commonly supported - would be nice throug so fallback option would be to use the vulkan api as target :thinking:
Author
Owner

@technovangelist commented on GitHub (Dec 19, 2023):

Hi, thanks so much for submitting your issue. At the moment we do not support inference using Intel's GPUs. I'll leave this issue open to track adding Intel support in the future.

<!-- gh-comment-id:1863308447 --> @technovangelist commented on GitHub (Dec 19, 2023): Hi, thanks so much for submitting your issue. At the moment we do not support inference using Intel's GPUs. I'll leave this issue open to track adding Intel support in the future.
Author
Owner

@itlackey commented on GitHub (Jan 12, 2024):

+1 for IPEX support

Would it be possible to include oneAPI to support this? OpenCL is currently not working well with Intel GPUs. Vulkan may also be a decent option.

<!-- gh-comment-id:1889588574 --> @itlackey commented on GitHub (Jan 12, 2024): +1 for IPEX support Would it be possible to include oneAPI to support this? OpenCL is currently not working well with Intel GPUs. Vulkan may also be a decent option.
Author
Owner

@Leo512bit commented on GitHub (Feb 4, 2024):

It looks like llama.cpp now supports SYCL for Intel GPUs. Is Arc support now possible?

https://github.com/ggerganov/llama.cpp/pull/2690

<!-- gh-comment-id:1925597248 --> @Leo512bit commented on GitHub (Feb 4, 2024): It looks like llama.cpp now supports SYCL for Intel GPUs. Is Arc support now possible? https://github.com/ggerganov/llama.cpp/pull/2690
Author
Owner

@uxdesignerhector commented on GitHub (Feb 4, 2024):

Last Automatic1111 update 1.7.0 included IPEX and initial support for Intel Arc GPUs on Windows, maybe someone could have a look a see what they have done to make it possible. I know this is for Windows only, but is shows that it is possible to integrate it while on Linux it should be easier as Windows support came later.

I'm aware that maybe WSL is another different beast, I remember having too much trouble installing Automatic1111 and accessing my Intel Arc GPU due to some limitation with the memory and privileges hardcoded into WSL

<!-- gh-comment-id:1925760352 --> @uxdesignerhector commented on GitHub (Feb 4, 2024): Last [Automatic1111](https://github.com/AUTOMATIC1111) update [1.7.0](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14171) included [IPEX](https://github.com/intel/intel-extension-for-pytorch) and initial support for Intel Arc GPUs on Windows, maybe someone could have a look a see what they have done to make it possible. I know this is for Windows only, but is shows that it is possible to integrate it while on Linux it should be easier as Windows support came later. I'm aware that maybe [WSL](https://learn.microsoft.com/en-us/windows/wsl/) is another different beast, I remember having too much trouble installing [Automatic1111](https://github.com/AUTOMATIC1111) and accessing my Intel Arc GPU due to some limitation with the memory and privileges hardcoded into [WSL](https://learn.microsoft.com/en-us/windows/wsl/)
Author
Owner

@felipeagc commented on GitHub (Feb 12, 2024):

Hey everyone, I made some progress on adding Intel Arc support to ollama: #2458

<!-- gh-comment-id:1937932867 --> @felipeagc commented on GitHub (Feb 12, 2024): Hey everyone, I made some progress on adding Intel Arc support to ollama: #2458
Author
Owner

@ghost commented on GitHub (Feb 13, 2024):

Thank you @felipeagc

<!-- gh-comment-id:1942690979 --> @ghost commented on GitHub (Feb 13, 2024): Thank you @felipeagc
Author
Owner

@tannisroot commented on GitHub (Apr 24, 2024):

Support for SYCL/Intel GPUs would be quite interesting because:

  1. Intel offers by far the cheapest 16GB VRAM GPU, A770, costing only $279.99 and packing more than enough performance for inference. RTX 4060 Ti with the same amount of VRAM costs at least $459.99.
  2. Intel also offers the cheapest discrete GPU that is not a hot pile of garbage, the A380.
    It is a very popular choice for home servers, since it has very good transcoding compatibility with Jellyfin, and is also supported by Frigate for ML workloads.
    With 6GB of VRAM, it should be capable of running competent small models like llama3, which in combination with Home Assistant can be used to power a completely local voice assistant and destroy the likes of Alexa and Google Assistant comprehension wise.
  3. Upcoming Battlemage GPUs might offer even more competitive hardware for inference workloads.
<!-- gh-comment-id:2075485855 --> @tannisroot commented on GitHub (Apr 24, 2024): Support for SYCL/Intel GPUs would be quite interesting because: 1) Intel offers by far the cheapest 16GB VRAM GPU, A770, costing only $279.99 and packing more than enough performance for inference. RTX 4060 Ti with the same amount of VRAM costs at least $459.99. 2) Intel also offers the cheapest discrete GPU that is not a hot pile of garbage, the A380. It is a very popular choice for home servers, since it has very good transcoding compatibility with Jellyfin, and is also supported by Frigate for ML workloads. With 6GB of VRAM, it should be capable of running competent small models like llama3, which in combination with Home Assistant can be used to power a completely local voice assistant and destroy the likes of Alexa and Google Assistant comprehension wise. 3) Upcoming Battlemage GPUs might offer even more competitive hardware for inference workloads.
Author
Owner

@Kamryx commented on GitHub (Apr 25, 2024):

Extremely eager to have support for Arc GPUs. Have an A380 idle in my home server ready to be put to use. As the above commenter said, probably the best price/performance GPU for this work load.

I have an ultra layman and loose understanding of all this stuff, but have I correctly surmised that llama.cpp essentially already has Arc support, and it just needs to be implemented/merged into Ollama? And if that’s the case, are we probably in the final stretch?

<!-- gh-comment-id:2076370125 --> @Kamryx commented on GitHub (Apr 25, 2024): Extremely eager to have support for Arc GPUs. Have an A380 idle in my home server ready to be put to use. As the above commenter said, probably the best price/performance GPU for this work load. I have an ultra layman and loose understanding of all this stuff, but have I correctly surmised that llama.cpp essentially already has Arc support, and it just needs to be implemented/merged into Ollama? And if that’s the case, are we probably in the final stretch?
Author
Owner

@asknight1980 commented on GitHub (May 5, 2024):

I too have an A380 sitting idle in my R520 anxiously waiting for Ollama to recognize it. Thank you all for the progress you have contributed to this.

<!-- gh-comment-id:2094974162 --> @asknight1980 commented on GitHub (May 5, 2024): I too have an A380 sitting idle in my R520 anxiously waiting for Ollama to recognize it. Thank you all for the progress you have contributed to this.
Author
Owner

@kozuch commented on GitHub (Jun 6, 2024):

Is this now done with the merge of https://github.com/ollama/ollama/pull/3278 that has been released in v0.1.140?

<!-- gh-comment-id:2153400873 --> @kozuch commented on GitHub (Jun 6, 2024): Is this now done with the merge of https://github.com/ollama/ollama/pull/3278 that has been released in v0.1.140?
Author
Owner

@dhiltgen commented on GitHub (Jun 6, 2024):

@kozuch not quite. It's close.

If you build locally from source, it should work, but we haven't integrated it into our official builds yet.

<!-- gh-comment-id:2153473331 --> @dhiltgen commented on GitHub (Jun 6, 2024): @kozuch not quite. It's close. If you build locally from source, it should work, but we haven't integrated it into our official builds yet.
Author
Owner

@uxdesignerhector commented on GitHub (Jun 7, 2024):

@dhiltgen do you know if this will work on WSL or Windows or only Linux?

<!-- gh-comment-id:2155385664 --> @uxdesignerhector commented on GitHub (Jun 7, 2024): @dhiltgen do you know if this will work on [WSL](https://learn.microsoft.com/en-us/windows/wsl/) or Windows or only Linux?
Author
Owner

@dhiltgen commented on GitHub (Jun 7, 2024):

The Linux build is already covered in #4876 and my goal is to enable windows as well. This doc implies WSL2 should work.

<!-- gh-comment-id:2155553028 --> @dhiltgen commented on GitHub (Jun 7, 2024): The Linux build is already covered in #4876 and my goal is to enable windows as well. [This doc](https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html) implies WSL2 should work.
Author
Owner

@marcoleder commented on GitHub (Jun 11, 2024):

Looking forward to it! Let me know once it is available for Windows :)

<!-- gh-comment-id:2161597238 --> @marcoleder commented on GitHub (Jun 11, 2024): Looking forward to it! Let me know once it is available for Windows :)
Author
Owner

@kozuch commented on GitHub (Jun 12, 2024):

@kozuch not quite. It's close.

If you build locally from source, it should work, but we haven't integrated it into our official builds yet.

You are not branching the releases off main? Why was the https://github.com/ollama/ollama/pull/3278 change seen in https://github.com/ollama/ollama/compare/v0.1.39...v0.1.40 changelist then?

<!-- gh-comment-id:2163588281 --> @kozuch commented on GitHub (Jun 12, 2024): > @kozuch not quite. It's close. > > If you build locally from source, it should work, but we haven't integrated it into our official builds yet. You are not branching the releases off main? Why was the https://github.com/ollama/ollama/pull/3278 change seen in https://github.com/ollama/ollama/compare/v0.1.39...v0.1.40 changelist then?
Author
Owner

@WeihanLi commented on GitHub (Jun 12, 2024):

Is there a release schedule for this?

<!-- gh-comment-id:2163596467 --> @WeihanLi commented on GitHub (Jun 12, 2024): Is there a release schedule for this?
Author
Owner

@asknight1980 commented on GitHub (Jun 14, 2024):

How can I build it to enable Intel Arc?

Install required tools:
go version 1.22 or higher

/builds/ollama-0.1.44/go.mod:3: invalid go version '1.22.0': must match format 1.23
Go 1.23 has been either been pulled back or isn't clearly available.

<!-- gh-comment-id:2167179437 --> @asknight1980 commented on GitHub (Jun 14, 2024): How can I build it to enable Intel Arc? > Install required tools: go version 1.22 or higher > /builds/ollama-0.1.44/go.mod:3: invalid go version '1.22.0': must match format 1.23 Go 1.23 has been either been pulled back or isn't clearly available.
Author
Owner

@dhiltgen commented on GitHub (Jun 19, 2024):

Unfortunately users have reported crashing in the Intel GPU management library on some windows systems, so we've had to disable it temporarily until we figure out what's causing the crash. You can re-enable it by setting OLLAMA_INTEL_GPU=1

We don't have docs explaining how to build since it's not reliable yet. You can take a look at the gen_linux.sh and gen_windows.ps1 scripts here for some inspiration on the required tools.

<!-- gh-comment-id:2177318451 --> @dhiltgen commented on GitHub (Jun 19, 2024): Unfortunately users have reported crashing in the Intel GPU management library on some windows systems, so we've had to disable it temporarily until we figure out what's causing the crash. You can re-enable it by setting OLLAMA_INTEL_GPU=1 We don't have docs explaining how to build since it's not reliable yet. You can take a look at the gen_linux.sh and gen_windows.ps1 scripts [here](https://github.com/ollama/ollama/tree/main/llm/generate) for some inspiration on the required tools.
Author
Owner

@dhiltgen commented on GitHub (Jun 19, 2024):

Quick update - the crash is fixed on main now, but we'll keep it behind the env var I mentioned above until we get #4876 merged and the resulting binaries validated on linux and windows with Arc GPUs.

<!-- gh-comment-id:2179052993 --> @dhiltgen commented on GitHub (Jun 19, 2024): Quick update - the crash is fixed on main now, but we'll keep it behind the env var I mentioned above until we get #4876 merged and the resulting binaries validated on linux and windows with Arc GPUs.
Author
Owner

@ConnorMeng commented on GitHub (Jun 20, 2024):

Sorry if it isn't appropriate to ask this here, but when do you think this will reach the docker image, and when might there be some documentation for that as well?

<!-- gh-comment-id:2181001119 --> @ConnorMeng commented on GitHub (Jun 20, 2024): Sorry if it isn't appropriate to ask this here, but when do you think this will reach the docker image, and when might there be some documentation for that as well?
Author
Owner

@YumingChang02 commented on GitHub (Jul 5, 2024):

Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device

"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B"

But it seems that it is not correctly detecting igpu memory size
Note this is what i see using Arc A380

"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB"

I am guessing this is what prevent igpu from working?

<!-- gh-comment-id:2210564684 --> @YumingChang02 commented on GitHub (Jul 5, 2024): Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device ``` "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B" ``` But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380 ``` "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB" ``` I am guessing this is what prevent igpu from working?
Author
Owner

@asknight1980 commented on GitHub (Jul 5, 2024):

Are you able to do any inference at all on the Arc A380? I am showing it loading the model in GPU memory on my A380 but the processing is still happening on the CPU while the GPU sits idle.

Jul 05 18:25:13 cyka-b ollama[578885]: 2024/07/05 18:25:13 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_G>
Jul 05 18:25:17 cyka-b ollama[578885]: time=2024-07-05T18:25:17.512-05:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=oneapi compute="" driver=0.0 name="Inte>

NAME ID SIZE PROCESSOR UNTIL
tinyllama:latest 2644915ede35 827 MB 100% GPU 4 minutes from now

<!-- gh-comment-id:2211515362 --> @asknight1980 commented on GitHub (Jul 5, 2024): Are you able to do any inference at all on the Arc A380? I am showing it loading the model in GPU memory on my A380 but the processing is still happening on the CPU while the GPU sits idle. Jul 05 18:25:13 cyka-b ollama[578885]: 2024/07/05 18:25:13 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_G> Jul 05 18:25:17 cyka-b ollama[578885]: time=2024-07-05T18:25:17.512-05:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=oneapi compute="" driver=0.0 name="Inte> NAME ID SIZE PROCESSOR UNTIL tinyllama:latest 2644915ede35 827 MB 100% GPU 4 minutes from now
Author
Owner

@MordragT commented on GitHub (Jul 7, 2024):

Is there any way to make ollama find the neo driver's libigdrcl.so library for opencl ? On my setup ollama always returns:

Jul 07 14:56:36 tom-desktop ollama[240788]: found 1 SYCL devices:
Jul 07 14:56:36 tom-desktop ollama[240788]: |  |                   |                                       |       |Max    |        |Max  |Global |                     |
Jul 07 14:56:36 tom-desktop ollama[240788]: |  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
Jul 07 14:56:36 tom-desktop ollama[240788]: |ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
Jul 07 14:56:36 tom-desktop ollama[240788]: |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
Jul 07 14:56:36 tom-desktop ollama[240788]: | 0| [level_zero:gpu:0]|                Intel Arc A750 Graphics|    1.3|    448|    1024|   32|  8096M|            1.3.29735|

And then a bit later:

Jul 07 14:56:36 tom-desktop ollama[240788]: Build program log for 'Intel(R) Arc(TM) A750 Graphics':
Jul 07 14:56:36 tom-desktop ollama[240788]:  -999 (Unknown PI error)Exception caught at file:/build/source/llm/llama.cpp/ggml/src/ggml-sycl.cpp, line:3121

I reproduced the error with llama-cpp and it seems like if llama-cpp can only find the level-zero device and not the opencl one it will throw the exception.

<!-- gh-comment-id:2212450905 --> @MordragT commented on GitHub (Jul 7, 2024): Is there any way to make ollama find the neo driver's libigdrcl.so library for opencl ? On my setup ollama always returns: ``` Jul 07 14:56:36 tom-desktop ollama[240788]: found 1 SYCL devices: Jul 07 14:56:36 tom-desktop ollama[240788]: | | | | |Max | |Max |Global | | Jul 07 14:56:36 tom-desktop ollama[240788]: | | | | |compute|Max work|sub |mem | | Jul 07 14:56:36 tom-desktop ollama[240788]: |ID| Device Type| Name|Version|units |group |group|size | Driver version| Jul 07 14:56:36 tom-desktop ollama[240788]: |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------| Jul 07 14:56:36 tom-desktop ollama[240788]: | 0| [level_zero:gpu:0]| Intel Arc A750 Graphics| 1.3| 448| 1024| 32| 8096M| 1.3.29735| ``` And then a bit later: ``` Jul 07 14:56:36 tom-desktop ollama[240788]: Build program log for 'Intel(R) Arc(TM) A750 Graphics': Jul 07 14:56:36 tom-desktop ollama[240788]: -999 (Unknown PI error)Exception caught at file:/build/source/llm/llama.cpp/ggml/src/ggml-sycl.cpp, line:3121 ``` I reproduced the error with llama-cpp and it seems like if llama-cpp can only find the level-zero device and not the opencl one it will throw the exception.
Author
Owner

@Yueming-Yan commented on GitHub (Jul 11, 2024):

Looking forward :)

Intel(R) Iris(R) Xe Graphics

time=2024-07-11T12:02:14.704+08:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-11T12:02:15.136+08:00 level=INFO source=gpu.go:324 msg="no compatible GPUs were discovered"

Append some useful links:
https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md
https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

<!-- gh-comment-id:2222052334 --> @Yueming-Yan commented on GitHub (Jul 11, 2024): Looking forward :) Intel(R) Iris(R) Xe Graphics ``` time=2024-07-11T12:02:14.704+08:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" time=2024-07-11T12:02:15.136+08:00 level=INFO source=gpu.go:324 msg="no compatible GPUs were discovered" ``` Append some useful links: https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md
Author
Owner

@TheSpaceGod commented on GitHub (Jul 18, 2024):

Out of curiosity, what is holding up this PR (https://github.com/ollama/ollama/pull/4876) making it to main? It looks like its passing all the relevant PR tests.
I think this would be a real game changer for all the people running small LLM models via docker on Intel NUC style computers like myself.

<!-- gh-comment-id:2234938349 --> @TheSpaceGod commented on GitHub (Jul 18, 2024): Out of curiosity, what is holding up this PR (https://github.com/ollama/ollama/pull/4876) making it to main? It looks like its passing all the relevant PR tests. I think this would be a real game changer for all the people running small LLM models via docker on Intel NUC style computers like myself.
Author
Owner

@tannisroot commented on GitHub (Jul 18, 2024):

Out of curiosity, what is holding up this PR (https://github.com/ollama/ollama/pull/4876) making it to main? It looks like its passing all the relevant PR tests.
I think this would be a real game changer for all the people running small LLM models via docker on Intel NUC style computers like myself.

The Windows driver for Intel is crashing with Ollama.
Honestly as a Linux user it's a little bit annoying, I imagine majority of people who want to use Ollama with Intel GPU plan to do so in their Linux box.
It's also not guranteed Intel will fix it any time soon, I remember another open source project DXVK encountered major crashing bugs exclusive to the Windows Intel driver, and it took years for things to get fixed afaik (if they are even fully fixed).

<!-- gh-comment-id:2236003076 --> @tannisroot commented on GitHub (Jul 18, 2024): > Out of curiosity, what is holding up this PR (https://github.com/ollama/ollama/pull/4876) making it to main? It looks like its passing all the relevant PR tests. > I think this would be a real game changer for all the people running small LLM models via docker on Intel NUC style computers like myself. The Windows driver for Intel is crashing with Ollama. Honestly as a Linux user it's a little bit annoying, I imagine majority of people who want to use Ollama with Intel GPU plan to do so in their Linux box. It's also not guranteed Intel will fix it any time soon, I remember another open source project DXVK encountered major crashing bugs exclusive to the Windows Intel driver, and it took years for things to get fixed afaik (if they are even fully fixed).
Author
Owner

@lirc571 commented on GitHub (Jul 18, 2024):

Some works are being done at #5593 and on llama.cpp side by Intel people. Looks like they are actively working on it!

<!-- gh-comment-id:2236031836 --> @lirc571 commented on GitHub (Jul 18, 2024): Some works are being done at #5593 and on llama.cpp side by Intel people. Looks like they are actively working on it!
Author
Owner

@tannisroot commented on GitHub (Jul 19, 2024):

Some works are being done at #5593 and on llama.cpp side by Intel people. Looks like they are actively working on it!

Oh then that is very good news!

<!-- gh-comment-id:2239028772 --> @tannisroot commented on GitHub (Jul 19, 2024): > Some works are being done at #5593 and on llama.cpp side by Intel people. Looks like they are actively working on it! Oh then that is very good news!
Author
Owner

@MarkWard0110 commented on GitHub (Jul 22, 2024):

Does this include support for the integrated GPU? For example, the Intel Core i9 14900k has an integrated GPU. When I enable the feature on Ubuntu Server 22.04 it crashes. OLLAMA_INTEL_GPU=1

I am curious to know if there are dependencies to have installed for this to work.

Jul 22 15:27:34 quorra systemd[1]: Started Ollama Service.
Jul 22 15:27:34 quorra ollama[3678911]: 2024/07/22 15:27:34 routes.go:1096: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.349Z level=INFO source=images.go:778 msg="total blobs: 81"
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=images.go:785 msg="total unused blobs removed: 0"
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=routes.go:1143 msg="Listening on [::]:11434 (version 0.2.7)"
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2597150250/runners
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu_avx/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu_avx2/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cuda_v11/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/rocm_v60102/ollama_llama_server
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60102 cpu cpu_avx cpu_avx2]"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=sched.go:102 msg="starting llm scheduler"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06]
Jul 22 15:27:35 quorra ollama[3678911]: CUDA driver version: 12.5
Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.903Z level=DEBUG source=gpu.go:124 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06
Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] CUDA totalMem 15981 mb
Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] CUDA freeMem 15763 mb
Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] Compute Capability 8.9
Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libze_intel_gpu.so
Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libze_intel_gpu.so* /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so* /usr/lib*/libze_intel_gpu.so*]"
Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[]
Jul 22 15:27:36 quorra ollama[3678911]: releasing cuda driver library
Jul 22 15:27:36 quorra ollama[3678911]: panic: runtime error: invalid memory address or nil pointer dereference
Jul 22 15:27:36 quorra ollama[3678911]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xc pc=0x832ad7]
Jul 22 15:27:36 quorra ollama[3678911]: goroutine 1 [running]:
Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/gpu.GetGPUInfo()
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/ollama/ollama/gpu/gpu.go:313 +0xdf7
Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/server.Serve({0x1de902f8, 0xc000709b00})
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/ollama/ollama/server/routes.go:1176 +0x7a5
Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/cmd.RunServer(0xc00004cd00?, {0x1e723860?, 0x4?, 0x12a4ec5?})
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/ollama/ollama/cmd/cmd.go:1084 +0xfa
Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).execute(0xc000174308, {0x1e723860, 0x0, 0x0})
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/spf13/cobra@v1.7.0/command.go:940 +0x882
Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000123508)
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5
Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).Execute(...)
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/spf13/cobra@v1.7.0/command.go:992
Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).ExecuteContext(...)
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/spf13/cobra@v1.7.0/command.go:985
Jul 22 15:27:36 quorra ollama[3678911]: main.main()
Jul 22 15:27:36 quorra ollama[3678911]:         github.com/ollama/ollama/main.go:11 +0x4d
Jul 22 15:27:36 quorra systemd[1]: ollama.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jul 22 15:27:36 quorra systemd[1]: ollama.service: Failed with result 'exit-code'.
Jul 22 15:27:36 quorra systemd[1]: ollama.service: Consumed 4.799s CPU time.
Jul 22 15:27:39 quorra systemd[1]: ollama.service: Scheduled restart job, restart counter is at 29.
Jul 22 15:27:39 quorra systemd[1]: Stopped Ollama Service.
Jul 22 15:27:39 quorra systemd[1]: ollama.service: Consumed 4.799s CPU time.
<!-- gh-comment-id:2243288483 --> @MarkWard0110 commented on GitHub (Jul 22, 2024): Does this include support for the integrated GPU? For example, the Intel Core i9 14900k has an integrated GPU. When I enable the feature on Ubuntu Server 22.04 it crashes. `OLLAMA_INTEL_GPU=1` I am curious to know if there are dependencies to have installed for this to work. ``` Jul 22 15:27:34 quorra systemd[1]: Started Ollama Service. Jul 22 15:27:34 quorra ollama[3678911]: 2024/07/22 15:27:34 routes.go:1096: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.349Z level=INFO source=images.go:778 msg="total blobs: 81" Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=images.go:785 msg="total unused blobs removed: 0" Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=routes.go:1143 msg="Listening on [::]:11434 (version 0.2.7)" Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2597150250/runners Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz Jul 22 15:27:34 quorra ollama[3678911]: time=2024-07-22T15:27:34.350Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu_avx/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cpu_avx2/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/cuda_v11/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama2597150250/runners/rocm_v60102/ollama_llama_server Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11 rocm_v60102 cpu cpu_avx cpu_avx2]" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=sched.go:102 msg="starting llm scheduler" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.824Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06] Jul 22 15:27:35 quorra ollama[3678911]: CUDA driver version: 12.5 Jul 22 15:27:35 quorra ollama[3678911]: time=2024-07-22T15:27:35.903Z level=DEBUG source=gpu.go:124 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06 Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] CUDA totalMem 15981 mb Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] CUDA freeMem 15763 mb Jul 22 15:27:35 quorra ollama[3678911]: [GPU-007c9d9a-8177-bd6f-7654-45652102b937] Compute Capability 8.9 Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libze_intel_gpu.so Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libze_intel_gpu.so* /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so* /usr/lib*/libze_intel_gpu.so*]" Jul 22 15:27:36 quorra ollama[3678911]: time=2024-07-22T15:27:36.027Z level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[] Jul 22 15:27:36 quorra ollama[3678911]: releasing cuda driver library Jul 22 15:27:36 quorra ollama[3678911]: panic: runtime error: invalid memory address or nil pointer dereference Jul 22 15:27:36 quorra ollama[3678911]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xc pc=0x832ad7] Jul 22 15:27:36 quorra ollama[3678911]: goroutine 1 [running]: Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/gpu.GetGPUInfo() Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/gpu/gpu.go:313 +0xdf7 Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/server.Serve({0x1de902f8, 0xc000709b00}) Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/server/routes.go:1176 +0x7a5 Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/cmd.RunServer(0xc00004cd00?, {0x1e723860?, 0x4?, 0x12a4ec5?}) Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/cmd/cmd.go:1084 +0xfa Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).execute(0xc000174308, {0x1e723860, 0x0, 0x0}) Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra@v1.7.0/command.go:940 +0x882 Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000123508) Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).Execute(...) Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra@v1.7.0/command.go:992 Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra.(*Command).ExecuteContext(...) Jul 22 15:27:36 quorra ollama[3678911]: github.com/spf13/cobra@v1.7.0/command.go:985 Jul 22 15:27:36 quorra ollama[3678911]: main.main() Jul 22 15:27:36 quorra ollama[3678911]: github.com/ollama/ollama/main.go:11 +0x4d Jul 22 15:27:36 quorra systemd[1]: ollama.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Jul 22 15:27:36 quorra systemd[1]: ollama.service: Failed with result 'exit-code'. Jul 22 15:27:36 quorra systemd[1]: ollama.service: Consumed 4.799s CPU time. Jul 22 15:27:39 quorra systemd[1]: ollama.service: Scheduled restart job, restart counter is at 29. Jul 22 15:27:39 quorra systemd[1]: Stopped Ollama Service. Jul 22 15:27:39 quorra systemd[1]: ollama.service: Consumed 4.799s CPU time. ```
Author
Owner

@TheSpaceGod commented on GitHub (Aug 7, 2024):

Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs?
https://github.com/ollama/ollama/issues/2033

<!-- gh-comment-id:2272569388 --> @TheSpaceGod commented on GitHub (Aug 7, 2024): Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? https://github.com/ollama/ollama/issues/2033
Author
Owner

@asknight1980 commented on GitHub (Aug 9, 2024):

Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? #2033

It seems as if IPEX is the only way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier.

<!-- gh-comment-id:2278455341 --> @asknight1980 commented on GitHub (Aug 9, 2024): > Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? #2033 It seems as if IPEX is the *only* way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: [https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html](url) Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier.
Author
Owner

@TheSpaceGod commented on GitHub (Aug 9, 2024):

It seems as if IPEX is the only way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier.

Based on this chart, most all the Intel GPUs/iGPUs should support Vulkan 1.2+: https://www.intel.com/content/www/us/en/support/articles/000005524/graphics.html

I am really hoping this is an easier implementation route, because I agree, IPEX seems pretty hard to use in its current form. Even if Vulkan is slower than IPEX, some Intel GPU support will be better than nothing.

<!-- gh-comment-id:2278471534 --> @TheSpaceGod commented on GitHub (Aug 9, 2024): > It seems as if IPEX is the _only_ way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: [https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html](url) Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier. Based on this chart, most all the Intel GPUs/iGPUs should support Vulkan 1.2+: https://www.intel.com/content/www/us/en/support/articles/000005524/graphics.html I am really hoping this is an easier implementation route, because I agree, IPEX seems pretty hard to use in its current form. Even if Vulkan is slower than IPEX, some Intel GPU support will be better than nothing.
Author
Owner

@celesrenata commented on GitHub (Aug 13, 2024):

Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device

"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B"

But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380

"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB"

I am guessing this is what prevent igpu from working?

I am also running into the 0 memory available IGPU issue, borrowing configs from @MordragT in NixOS on Intel 185H chips.

<!-- gh-comment-id:2287369992 --> @celesrenata commented on GitHub (Aug 13, 2024): > Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device > > ``` > "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B" > ``` > > But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380 > > ``` > "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB" > ``` > > I am guessing this is what prevent igpu from working? I am also running into the 0 memory available IGPU issue, borrowing configs from @MordragT in NixOS on Intel 185H chips.
Author
Owner

@slyoldfox commented on GitHub (Aug 14, 2024):

Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device

"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B"

But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380

"inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB"

I am guessing this is what prevent igpu from working?

I am also running into the 0 memory available IGPU issue, borrowing configs from @MordragT in NixOS on Intel 185H chips.

I was seeing exactly the same issue

<!-- gh-comment-id:2288032277 --> @slyoldfox commented on GitHub (Aug 14, 2024): > > Is there any possibility to manual / auto detect internal gpu size? it seems igpu is detected as a oneapi compute device > > > > > > ``` > > > "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) UHD Graphics" total="0 B" available="0 B" > > > ``` > > > > > > But it seems that it is not correctly detecting igpu memory size Note this is what i see using Arc A380 > > > > > > ``` > > > "inference compute" id=0 library=oneapi compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB" > > > ``` > > > > > > I am guessing this is what prevent igpu from working? > > > > I am also running into the 0 memory available IGPU issue, borrowing configs from @MordragT in NixOS on Intel 185H chips. I was seeing exactly the same issue
Author
Owner

@celesrenata commented on GitHub (Aug 19, 2024):

I am trying another route, I have build SR-IOV support for my ARC iGPU, and tested it successfully in Kube with plex. Once RAM arrives today, I will attempt to see if I can run OneAPI/IPEX-LLM from kubevirts to give to Ollama. My attempt yesterday showed that it offloaded to CPU, but I had no RAM left. I'll try to update this thread if I have any success.

<!-- gh-comment-id:2297163766 --> @celesrenata commented on GitHub (Aug 19, 2024): I am trying another route, I have build SR-IOV support for my ARC iGPU, and tested it successfully in Kube with plex. Once RAM arrives today, I will attempt to see if I can run OneAPI/IPEX-LLM from kubevirts to give to Ollama. My attempt yesterday showed that it offloaded to CPU, but I had no RAM left. I'll try to update this thread if I have any success.
Author
Owner

@sambartik commented on GitHub (Aug 21, 2024):

Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? #2033

It seems as if IPEX is the only way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier.

Hi, I am on Ubuntu 24.04 LTS and got it working by using their container image intelanalytics/ipex-llm-inference-cpp-xpu:latest, more info here.

The only caveat was that the container was not able to detect my GPU. Digging deeper, I found that my kernel that came with the ubuntu - 6.8.0-40-generic was causing the issue. The workaround until it gets fixed was to set these environmental variables:

      - NEOReadDebugKeys=1
      - OverrideGpuAddressSpace=48

After that I got my GPU detected. Also, because it is docker, I don't need to worry about service startup as it is handled by docker.

<!-- gh-comment-id:2302337415 --> @sambartik commented on GitHub (Aug 21, 2024): > > Any idea how this ticket will affect this effort? Is IPEX or Vulkan the better route to go for Intel GPUs? #2033 > > It seems as if IPEX is the _only_ way at this time, and that's only if you use Ubuntu 22.04. It doesn't work at all for me on 24.04. Intel has released their own guidance here: [https://www.intel.com/content/www/us/en/content-details/826081/running-ollama-with-open-webui-on-intel-hardware-platform.html](url) Only follow this guide if you can babysit the system it is installed on at every reboot because this guide does not enable any automatic service startups like Ollama and OpenWebUI include/intend as default. Very clunky, if you can even get it to work. I'm almost to the point of discarding my Intel GPU's in favor of amd/nvidia because those simply work so much easier. Hi, I am on Ubuntu 24.04 LTS and got it working by using their container image `intelanalytics/ipex-llm-inference-cpp-xpu:latest`, more info [here](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/docker_cpp_xpu_quickstart.md). The only caveat was that the container was not able to detect my GPU. Digging deeper, I found that my kernel that came with the ubuntu - 6.8.0-40-generic was causing the issue. The workaround until it gets fixed was to set these environmental variables: ``` - NEOReadDebugKeys=1 - OverrideGpuAddressSpace=48 ``` After that I got my GPU detected. Also, because it is docker, I don't need to worry about service startup as it is handled by docker.
Author
Owner

@Xyz00777 commented on GitHub (Sep 3, 2024):

im a little bit confused atm, what is the state now atm? Because I have ollama running natively on my linux debian vm with a pass throught Arc 770 without container. I have the environment option for intel GPUs enabled and I can see it with lspci

# lspci
00:10.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08)

but when i start the ollama service it says that it cant find any GPU

Sep 03 21:20:23 ollama ollama[988]: 2024/09/03 21:20:23 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QU>
Sep 03 21:20:23 ollama ollama[988]: time=2024-09-03T21:20:23.996+02:00 level=INFO source=images.go:753 msg="total blobs: 4"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.010+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.015+02:00 level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9)"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.017+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama817142681/runners
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="15.5 GiB" available="14.5 GiB"
Sep 03 21:20:44 ollama ollama[988]: [GIN] 2024/09/03 - 21:20:44 | 200 |    7.026253ms |      172.17.0.2 | GET      "/api/tags"
Sep 03 21:21:09 ollama ollama[988]: time=2024-09-03T21:21:09.653+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[14.1 GiB]" memory.required.full="9.2 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[9.2 GiB]" memory.weights.total=">
<!-- gh-comment-id:2327271456 --> @Xyz00777 commented on GitHub (Sep 3, 2024): im a little bit confused atm, what is the state now atm? Because I have ollama running natively on my linux debian vm with a pass throught Arc 770 without container. I have the environment option for intel GPUs enabled and I can see it with lspci ``` # lspci 00:10.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) ``` but when i start the ollama service it says that it cant find any GPU ``` Sep 03 21:20:23 ollama ollama[988]: 2024/09/03 21:20:23 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QU> Sep 03 21:20:23 ollama ollama[988]: time=2024-09-03T21:20:23.996+02:00 level=INFO source=images.go:753 msg="total blobs: 4" Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.010+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0" Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.015+02:00 level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9)" Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.017+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama817142681/runners Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]" Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs" Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="15.5 GiB" available="14.5 GiB" Sep 03 21:20:44 ollama ollama[988]: [GIN] 2024/09/03 - 21:20:44 | 200 | 7.026253ms | 172.17.0.2 | GET "/api/tags" Sep 03 21:21:09 ollama ollama[988]: time=2024-09-03T21:21:09.653+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[14.1 GiB]" memory.required.full="9.2 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[9.2 GiB]" memory.weights.total="> ```
Author
Owner

@tannisroot commented on GitHub (Sep 3, 2024):

im a little bit confused atm, what is the state now atm? Because I have ollama running natively on my linux debian vm with a pass throught Arc 770 without container. I have the environment option for intel GPUs enabled and I can see it with lspci

# lspci
00:10.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08)

but when i start the ollama service it says that it cant find any GPU

Sep 03 21:20:23 ollama ollama[988]: 2024/09/03 21:20:23 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QU>
Sep 03 21:20:23 ollama ollama[988]: time=2024-09-03T21:20:23.996+02:00 level=INFO source=images.go:753 msg="total blobs: 4"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.010+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.015+02:00 level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9)"
Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.017+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama817142681/runners
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered"
Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="15.5 GiB" available="14.5 GiB"
Sep 03 21:20:44 ollama ollama[988]: [GIN] 2024/09/03 - 21:20:44 | 200 |    7.026253ms |      172.17.0.2 | GET      "/api/tags"
Sep 03 21:21:09 ollama ollama[988]: time=2024-09-03T21:21:09.653+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[14.1 GiB]" memory.required.full="9.2 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[9.2 GiB]" memory.weights.total=">

How did you compile Ollama?

<!-- gh-comment-id:2327308794 --> @tannisroot commented on GitHub (Sep 3, 2024): > im a little bit confused atm, what is the state now atm? Because I have ollama running natively on my linux debian vm with a pass throught Arc 770 without container. I have the environment option for intel GPUs enabled and I can see it with lspci > > ``` > # lspci > 00:10.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) > ``` > > but when i start the ollama service it says that it cant find any GPU > > ``` > Sep 03 21:20:23 ollama ollama[988]: 2024/09/03 21:20:23 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:true OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QU> > Sep 03 21:20:23 ollama ollama[988]: time=2024-09-03T21:20:23.996+02:00 level=INFO source=images.go:753 msg="total blobs: 4" > Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.010+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0" > Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.015+02:00 level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9)" > Sep 03 21:20:24 ollama ollama[988]: time=2024-09-03T21:20:24.017+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama817142681/runners > Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]" > Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.200+02:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs" > Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" > Sep 03 21:20:37 ollama ollama[988]: time=2024-09-03T21:20:37.214+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="15.5 GiB" available="14.5 GiB" > Sep 03 21:20:44 ollama ollama[988]: [GIN] 2024/09/03 - 21:20:44 | 200 | 7.026253ms | 172.17.0.2 | GET "/api/tags" > Sep 03 21:21:09 ollama ollama[988]: time=2024-09-03T21:21:09.653+02:00 level=INFO source=memory.go:309 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[14.1 GiB]" memory.required.full="9.2 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[9.2 GiB]" memory.weights.total="> > ``` How did you compile Ollama?
Author
Owner

@Xyz00777 commented on GitHub (Sep 3, 2024):

i didnt compiled it, i just downloaded it with ansible, thats what i had done:

- name: Download and extract Ollama package
  hosts: localhost
  become: yes
  tasks:
    - name: Download Ollama tarball
      get_url:
        url: https://ollama.com/download/ollama-linux-amd64.tgz
        dest: /tmp/ollama-linux-amd64.tgz
        mode: '0644'

    - name: Extract Ollama tarball to /usr
      unarchive:
        src: /tmp/ollama-linux-amd64.tgz
        dest: /usr
        remote_src: yes
        extra_opts: [--strip-components=1]
        creates: /usr/ollama  # This prevents re-extraction if the target already exists

    - name: Create Ollama user
      user:
        name: ollama
        system: yes
        shell: /bin/false
        home: /usr/share/ollama
        create_home: yes

    - name: Create systemd service file for Ollama
      copy:
        dest: /etc/systemd/system/ollama.service
        content: |
          [Unit]
          Description=Ollama Service
          After=network-online.target

          [Service]
          Environment="HOME=/mnt/ollama/llms"
          Environment="OLLAMA_INTEL_GPU=1"
          Environment="OLLAMA_HOST=0.0.0.0"
          ExecStart=/usr/bin/ollama serve
          User=ollama
          Group=ollama
          Restart=always
          RestartSec=3

          [Install]
          WantedBy=default.target
        owner: root
        group: root
        mode: '0644'
      notify: reload systemd

    - name: Enable and start Ollama service
      systemd:
        name: ollama
        enabled: yes
        state: started
<!-- gh-comment-id:2327329267 --> @Xyz00777 commented on GitHub (Sep 3, 2024): i didnt compiled it, i just downloaded it with ansible, thats what i had done: ``` - name: Download and extract Ollama package hosts: localhost become: yes tasks: - name: Download Ollama tarball get_url: url: https://ollama.com/download/ollama-linux-amd64.tgz dest: /tmp/ollama-linux-amd64.tgz mode: '0644' - name: Extract Ollama tarball to /usr unarchive: src: /tmp/ollama-linux-amd64.tgz dest: /usr remote_src: yes extra_opts: [--strip-components=1] creates: /usr/ollama # This prevents re-extraction if the target already exists - name: Create Ollama user user: name: ollama system: yes shell: /bin/false home: /usr/share/ollama create_home: yes - name: Create systemd service file for Ollama copy: dest: /etc/systemd/system/ollama.service content: | [Unit] Description=Ollama Service After=network-online.target [Service] Environment="HOME=/mnt/ollama/llms" Environment="OLLAMA_INTEL_GPU=1" Environment="OLLAMA_HOST=0.0.0.0" ExecStart=/usr/bin/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 [Install] WantedBy=default.target owner: root group: root mode: '0644' notify: reload systemd - name: Enable and start Ollama service systemd: name: ollama enabled: yes state: started ```
Author
Owner

@tannisroot commented on GitHub (Sep 3, 2024):

Release version of Ollama is not compiled with OneAPI (Intel) support.
You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled

<!-- gh-comment-id:2327393726 --> @tannisroot commented on GitHub (Sep 3, 2024): Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled
Author
Owner

@Xyz00777 commented on GitHub (Sep 3, 2024):

Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled

so i have to wait until at least the intel compatibel linux package is updated to can be downloaded or i have to compile it and have to have installed packages you mentioned? or do i even than need these packages if the intel compatible package for download got releases (sorry im a beginner in these area and have no experience)

<!-- gh-comment-id:2327407438 --> @Xyz00777 commented on GitHub (Sep 3, 2024): > Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled so i have to wait until at least the intel compatibel linux package is updated to can be downloaded or i have to compile it and have to have installed packages you mentioned? or do i even than need these packages if the intel compatible package for download got releases (sorry im a beginner in these area and have no experience)
Author
Owner

@tannisroot commented on GitHub (Sep 3, 2024):

Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled

so i have to wait until at least the intel compatibel linux package is updated to can be downloaded or i have to compile it and have to have installed packages you mentioned? or do i even than need these packages if the intel compatible package for download got releases (sorry im a beginner in these area and have no experience)

I believe Ollama does plan to provide an Intel supporting package at some point in the near future.
Meanwhile you can try building on your own. If you need help with that, @ me on the official Ollama discord, I'll be glad to assist you during european day hours!

<!-- gh-comment-id:2327593956 --> @tannisroot commented on GitHub (Sep 3, 2024): > > Release version of Ollama is not compiled with OneAPI (Intel) support. You need to fetch the repo, install level zero drivers, intel-basekit (info on Intel's website), activate runtime and then compile with certain envars enabled > > so i have to wait until at least the intel compatibel linux package is updated to can be downloaded or i have to compile it and have to have installed packages you mentioned? or do i even than need these packages if the intel compatible package for download got releases (sorry im a beginner in these area and have no experience) I believe Ollama does plan to provide an Intel supporting package at some point in the near future. Meanwhile you can try building on your own. If you need help with that, @ me on the official Ollama discord, I'll be glad to assist you during european day hours!
Author
Owner

@xiangyang-95 commented on GitHub (Oct 4, 2024):

It would be great if we could download, extract, and run Ollama on an Intel GPU directly. The example would be like

curl -L https://ollama.com/download/ollama-linux-amd64-sycl.tgz -o ollama-linux-amd64-sycl.tgz
sudo tar -C /usr -xzf ollama-linux-amd64-sycl.tgz

I am willing to contribute this feature if needed.

<!-- gh-comment-id:2392586023 --> @xiangyang-95 commented on GitHub (Oct 4, 2024): It would be great if we could download, extract, and run Ollama on an Intel GPU directly. The example would be like ``` curl -L https://ollama.com/download/ollama-linux-amd64-sycl.tgz -o ollama-linux-amd64-sycl.tgz sudo tar -C /usr -xzf ollama-linux-amd64-sycl.tgz ``` I am willing to contribute this feature if needed.
Author
Owner

@semidark commented on GitHub (Oct 10, 2024):

With the help of @tannisroot, I successfully compiled Ollama with Intel GPU support from source.

The process was quite straightforward, and everything went smoothly. I had high hopes since I've been running llama.cpp standalone with my iGPU for the past few weeks. However, when I ran Ollama, it detected my iGPU, but the integrated llama.cpp server did not use it.

I suspect this is related to Ollama's handling of the unified memory on the iGPU, as mentioned by @dhiltgen in this comment .

Here is some output where Ollama reports that the memory size is 0 Bytes:

time=2024-10-10T22:44:15.930+02:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-10-10T22:44:15.948+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Iris(R) Xe Graphics" total="0 B" available="0 B"

To investigate, I ran the ollama_llama_server directly without using Ollama, and it seemed to recognize my iGPU and Unified RAM as expected:

cd llm/build/linux/amd64/oneapi/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.
./ollama_llama_server -m ~/src/llama.cpp/models/gemma-2-2b-it-Q8_0.gguf -c 4000 -ngl 28 --host 127.0.0.1 --port 3000
[...]
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
llm_load_tensors: ggml ctx size =    0.26 MiB
llm_load_tensors: offloading 26 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 27/27 layers to GPU
llm_load_tensors:      SYCL0 buffer size =  2649.78 MiB
llm_load_tensors:        CPU buffer size =   597.66 MiB
[...]

So, how can I get Ollama to recognize the Unified Memory on my iGPU? Could we consider a quick fix to the GPU identification code, perhaps forcing Ollama to work with Unified Memory when the ZES_ENABLE_SYSMAN=1 environment variable is set?

<!-- gh-comment-id:2406086817 --> @semidark commented on GitHub (Oct 10, 2024): With the help of @tannisroot, I successfully compiled Ollama with Intel GPU support from source. The process was quite straightforward, and everything went smoothly. I had high hopes since I've been running llama.cpp standalone with my iGPU for the past few weeks. However, when I ran Ollama, it detected my iGPU, but the integrated llama.cpp server did not use it. I suspect this is related to Ollama's handling of the unified memory on the iGPU, as mentioned by @dhiltgen in [this comment](https://github.com/ollama/ollama/issues/5387#issuecomment-2204423270) . Here is some output where Ollama reports that the memory size is 0 Bytes: ``` time=2024-10-10T22:44:15.930+02:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs" time=2024-10-10T22:44:15.948+02:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Iris(R) Xe Graphics" total="0 B" available="0 B" ``` To investigate, I ran the `ollama_llama_server` directly without using Ollama, and it seemed to recognize my iGPU and Unified RAM as expected: ``` cd llm/build/linux/amd64/oneapi/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:. ./ollama_llama_server -m ~/src/llama.cpp/models/gemma-2-2b-it-Q8_0.gguf -c 4000 -ngl 28 --host 127.0.0.1 --port 3000 ``` ``` [...] ggml_sycl_init: SYCL_USE_XMX: yes ggml_sycl_init: found 1 SYCL devices: get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory llm_load_tensors: ggml ctx size = 0.26 MiB llm_load_tensors: offloading 26 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 27/27 layers to GPU llm_load_tensors: SYCL0 buffer size = 2649.78 MiB llm_load_tensors: CPU buffer size = 597.66 MiB [...] ``` So, how can I get Ollama to recognize the Unified Memory on my iGPU? Could we consider a quick fix to the GPU identification code, perhaps forcing Ollama to work with Unified Memory when the `ZES_ENABLE_SYSMAN=1` environment variable is set?
Author
Owner

@Gunnarr970 commented on GitHub (Oct 11, 2024):

Here is an ipex-llm beta. It allows ollama to work on very old "HD Graphics 630" using SYCL.

<!-- gh-comment-id:2407138095 --> @Gunnarr970 commented on GitHub (Oct 11, 2024): Here is an [ipex-llm beta](https://github.com/intel-analytics/ipex-llm/issues/12120#issuecomment-2403947706). It allows ollama to work on very old "HD Graphics 630" using SYCL.
Author
Owner

@celesrenata commented on GitHub (Oct 13, 2024):

I am trying another route, I have build SR-IOV support for my ARC iGPU, and tested it successfully in Kube with plex. Once RAM arrives today, I will attempt to see if I can run OneAPI/IPEX-LLM from kubevirts to give to Ollama. My attempt yesterday showed that it offloaded to CPU, but I had no RAM left. I'll try to update this thread if I have any success.

I did in the end have success with my little project.
https://github.com/celesrenata/nixos-k3s-configs
specifically with Ubuntu Kubevirts. So if you want to borrow from my work, I suggest looking into: https://github.com/celesrenata/nixos-k3s-configs/blob/main/kubevirt/ipex-1x/bootstrap-ipex-fleet.sh works with Ubuntu 24.04 LTS

<!-- gh-comment-id:2409092251 --> @celesrenata commented on GitHub (Oct 13, 2024): > I am trying another route, I have build SR-IOV support for my ARC iGPU, and tested it successfully in Kube with plex. Once RAM arrives today, I will attempt to see if I can run OneAPI/IPEX-LLM from kubevirts to give to Ollama. My attempt yesterday showed that it offloaded to CPU, but I had no RAM left. I'll try to update this thread if I have any success. I did in the end have success with my little project. https://github.com/celesrenata/nixos-k3s-configs specifically with Ubuntu Kubevirts. So if you want to borrow from my work, I suggest looking into: https://github.com/celesrenata/nixos-k3s-configs/blob/main/kubevirt/ipex-1x/bootstrap-ipex-fleet.sh works with Ubuntu 24.04 LTS
Author
Owner

@WoutvanderAa commented on GitHub (Nov 6, 2024):

do the arc cards already work? I have a intel arc a380 in my unraid server atm and I would love to use it for ollama.

<!-- gh-comment-id:2460753453 --> @WoutvanderAa commented on GitHub (Nov 6, 2024): do the arc cards already work? I have a intel arc a380 in my unraid server atm and I would love to use it for ollama.
Author
Owner

@yurhett commented on GitHub (Nov 10, 2024):

Hi @dhiltgen,

Thank you for your hard work and dedication to improving ollama. I've reviewed the changes introduced in the 0.4 update and noticed that a significant portion of the codebase has been restructured, and the build system has transitioned to using make. Consequently, support for Intel GPUs has been excluded in this update.

However, it's worth noting that upstream llama.cpp has now officially added support for Intel GPUs. Considering this development, I would like to inquire if there are plans to integrate Intel GPU support into future releases of ollama.

Thank you for your time and consideration.

<!-- gh-comment-id:2466808854 --> @yurhett commented on GitHub (Nov 10, 2024): Hi @dhiltgen, Thank you for your hard work and dedication to improving ollama. I've reviewed the changes introduced in the 0.4 update and noticed that a significant portion of the codebase has been restructured, and the build system has transitioned to using make. Consequently, support for Intel GPUs has been excluded in this update. However, it's worth noting that upstream llama.cpp has now officially added support for Intel GPUs. Considering this development, I would like to inquire if there are plans to integrate Intel GPU support into future releases of ollama. Thank you for your time and consideration.
Author
Owner

@pepijndevos commented on GitHub (Nov 10, 2024):

It seems indeed 0.4 just does not build Intel Arc support using the method suggested above. Is there another method?

For now it seems git checkout v0.3.14 will get you... somewhere, but currently still playing whack-a-mole with compiler errors.

The reason I'm trying to build from source is that the ipex-llm bundled version appears broken
https://github.com/intel-analytics/ipex-llm/issues/12374

Update: I built from source, result:

Abort was called at 1078 line in file:
/usr/src/debug/intel-compute-runtime/compute-runtime-24.39.31294.12/shared/source/os_interface/linux/drm_neo.cpp
<!-- gh-comment-id:2466921013 --> @pepijndevos commented on GitHub (Nov 10, 2024): It seems indeed 0.4 just does not build Intel Arc support using the method suggested above. Is there another method? For now it seems `git checkout v0.3.14` will get you... somewhere, but currently still playing whack-a-mole with compiler errors. The reason I'm trying to build from source is that the ipex-llm bundled version appears broken https://github.com/intel-analytics/ipex-llm/issues/12374 Update: I built from source, result: ``` Abort was called at 1078 line in file: /usr/src/debug/intel-compute-runtime/compute-runtime-24.39.31294.12/shared/source/os_interface/linux/drm_neo.cpp ```
Author
Owner

@yurhett commented on GitHub (Nov 11, 2024):

It seems indeed 0.4 just does not build Intel Arc support using the method suggested above. Is there another method?

For now it seems git checkout v0.3.14 will get you... somewhere, but currently still playing whack-a-mole with compiler errors.

The reason I'm trying to build from source is that the ipex-llm bundled version appears broken intel-analytics/ipex-llm#12374

Update: I built from source, result:

Abort was called at 1078 line in file:
/usr/src/debug/intel-compute-runtime/compute-runtime-24.39.31294.12/shared/source/os_interface/linux/drm_neo.cpp

Thanks! Thats a big discovery. I will try the 0.3 version on Windows to verify its correctness. I hope someone could guide this issue back on track.

Update:

  • I attempted to compile the 0.3 version on Windows, but the process fails at the GGML part. I will check if this is an environment-related issue.
  • I found a build provided at this link, but it crashes when the model offloading to the GPU, possibly due to an outdated version of oneAPI.
  • Many old Intel GPU-related PRs should be stopped from merging as they are now incompatible with the post-0.4 code structure and build tools.
    Given the current situation, I would like to know if the collaborators are willing to continue supporting Intel GPU in ollama. The current state is quite problematic, and clarity on this matter would help the community determine the next steps.
<!-- gh-comment-id:2467094367 --> @yurhett commented on GitHub (Nov 11, 2024): > It seems indeed 0.4 just does not build Intel Arc support using the method suggested above. Is there another method? > > For now it seems `git checkout v0.3.14` will get you... somewhere, but currently still playing whack-a-mole with compiler errors. > > The reason I'm trying to build from source is that the ipex-llm bundled version appears broken [intel-analytics/ipex-llm#12374](https://github.com/intel-analytics/ipex-llm/issues/12374) > > Update: I built from source, result: > > ``` > Abort was called at 1078 line in file: > /usr/src/debug/intel-compute-runtime/compute-runtime-24.39.31294.12/shared/source/os_interface/linux/drm_neo.cpp > ``` Thanks! Thats a big discovery. I will try the 0.3 version on Windows to verify its correctness. I hope someone could guide this issue back on track. Update: - I attempted to compile the 0.3 version on Windows, but the process fails at the GGML part. I will check if this is an environment-related issue. - I found a build provided at [this link](https://github.com/zhewang1-intc/ollama/releases), but it crashes when the model offloading to the GPU, possibly due to an outdated version of oneAPI. - Many old Intel GPU-related PRs should be stopped from merging as they are now incompatible with the post-0.4 code structure and build tools. Given the current situation, I would like to know if the collaborators are willing to continue supporting Intel GPU in ollama. The current state is quite problematic, and clarity on this matter would help the community determine the next steps.
Author
Owner

@peremenov commented on GitHub (Nov 11, 2024):

Hello!
I managed to run an official IPEX Docker image with Ollama. My system specs are: AMD Ryzen 5 5600, 128Gb of RAM and Intel Arc a380, Ubuntu 24.04 LTS. There are issues I faced during the experiments I didn't manage to resolve. Ollama only managed to work with 1 layer of the model offloaded to GPU, the logs don't show anything meaningful (probably due to lower-tier GPU). Also older models work fine, but the newer ones not so much. I think it happens because Ollama has evolved over time, and there is an older version in IPEX Docker image.
Any insights or suggestions regarding these issues would be appreciated.
Here is a docker-compose file which I used to run the container.
Thank you

networks:
  llms:
    external: true

services:
  ipex-llm:
    image: intelanalytics/ipex-llm-inference-cpp-xpu:latest
    restart: unless-stopped

    command: >
      /bin/bash -c "
        sycl-ls &&
        source ipex-llm-init --gpu --device Arc &&

        bash ./scripts/start-ollama.sh && # run the scripts
        kill $(pgrep -f ollama) && # kill background ollama
        /llm/ollama/ollama serve # run foreground ollama
      "
    devices:
      - /dev/dri
    volumes:
      - /dev/dri:/dev/dri
      - ${LLAMA_MODELS_DIR}:/models
      - ${OLLAMA_DIR}:/root/.ollama
    environment:
      # no_proxy: localhost,127.0.0.1
      DEVICE: Arc
      NEOReadDebugKeys: 1
      OverrideGpuAddressSpace: 48
      ZES_ENABLE_SYSMAN: 1

      OLLAMA_DEBUG: 1
      OLLAMA_INTEL_GPU: 1
      OLLAMA_NUM_PARALLEL: 1
      OLLAMA_HOST: 0.0.0.0
      OLLAMA_NUM_GPU: 1 # layers to offload

      SYCL_CACHE_PERSISTENT: 1
      SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS: 1
      ONEAPI_DEVICE_SELECTOR: level_zero:0

    shm_size: '8g'
    networks:
      - llms
<!-- gh-comment-id:2468429913 --> @peremenov commented on GitHub (Nov 11, 2024): Hello! I managed to run an official IPEX Docker image with Ollama. My system specs are: AMD Ryzen 5 5600, 128Gb of RAM and Intel Arc a380, Ubuntu 24.04 LTS. There are issues I faced during the experiments I didn't manage to resolve. Ollama only managed to work with 1 layer of the model offloaded to GPU, the logs don't show anything meaningful (probably due to lower-tier GPU). Also older models work fine, but the newer ones not so much. I think it happens because Ollama has evolved over time, and there is an older version in IPEX Docker image. Any insights or suggestions regarding these issues would be appreciated. Here is a docker-compose file which I used to run the container. Thank you ```yml networks: llms: external: true services: ipex-llm: image: intelanalytics/ipex-llm-inference-cpp-xpu:latest restart: unless-stopped command: > /bin/bash -c " sycl-ls && source ipex-llm-init --gpu --device Arc && bash ./scripts/start-ollama.sh && # run the scripts kill $(pgrep -f ollama) && # kill background ollama /llm/ollama/ollama serve # run foreground ollama " devices: - /dev/dri volumes: - /dev/dri:/dev/dri - ${LLAMA_MODELS_DIR}:/models - ${OLLAMA_DIR}:/root/.ollama environment: # no_proxy: localhost,127.0.0.1 DEVICE: Arc NEOReadDebugKeys: 1 OverrideGpuAddressSpace: 48 ZES_ENABLE_SYSMAN: 1 OLLAMA_DEBUG: 1 OLLAMA_INTEL_GPU: 1 OLLAMA_NUM_PARALLEL: 1 OLLAMA_HOST: 0.0.0.0 OLLAMA_NUM_GPU: 1 # layers to offload SYCL_CACHE_PERSISTENT: 1 SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS: 1 ONEAPI_DEVICE_SELECTOR: level_zero:0 shm_size: '8g' networks: - llms ```
Author
Owner

@marcin-kruszynski commented on GitHub (Nov 19, 2024):

@peremenov
Thanks for the yaml, it's working very good in case of my Meteor Lake Arc iGPU
Unfortunately, ipex-llm uses Ollama version 0.3.6 which will not run some new models (f.e. llama3.2-vision).

The trick with layers is probably setting OLLAMA_NUM_GPU to 999
I found this in ipex-llm docs:
image
https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

<!-- gh-comment-id:2485600286 --> @marcin-kruszynski commented on GitHub (Nov 19, 2024): @peremenov Thanks for the yaml, it's working very good in case of my Meteor Lake Arc iGPU Unfortunately, ipex-llm uses Ollama version 0.3.6 which will not run some new models (f.e. llama3.2-vision). The trick with layers is probably setting OLLAMA_NUM_GPU to 999 I found this in ipex-llm docs: ![image](https://github.com/user-attachments/assets/027e066f-b38d-47a5-9476-7688627617c8) https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md
Author
Owner

@peremenov commented on GitHub (Nov 20, 2024):

Hey, @marcin-kruszynski
Thank you for the response. Yes, I'm aware about OLLAMA_NUM_GPU setting. Tried different values, but OLLAMA_NUM_GPU=1 is only value when I managed to get stable performance. OLLAMA_NUM_GPU=2 works ok, but crashes sometimes. OLLAMA_NUM_GPU=999 crashes every time even on small models that should fit in VRAM. IDK, maybe it's somehow specific to my configuration. You totally right about older Ollama version used in IPEX image. It can't run llama3.2.
I'm really looking forward to see https://github.com/ollama/ollama/pull/5059 working in the future releases, because as I understand the authors of Ollama aren't planning to add support of SYCL, One API or anything like that.

<!-- gh-comment-id:2487867145 --> @peremenov commented on GitHub (Nov 20, 2024): Hey, @marcin-kruszynski Thank you for the response. Yes, I'm aware about `OLLAMA_NUM_GPU` setting. Tried different values, but `OLLAMA_NUM_GPU=1` is only value when I managed to get stable performance. `OLLAMA_NUM_GPU=2` works ok, but crashes sometimes. `OLLAMA_NUM_GPU=999` crashes every time even on small models that should fit in VRAM. IDK, maybe it's somehow specific to my configuration. You totally right about older Ollama version used in IPEX image. It can't run llama3.2. I'm really looking forward to see https://github.com/ollama/ollama/pull/5059 working in the future releases, because as I understand the authors of Ollama aren't planning to add support of SYCL, One API or anything like that.
Author
Owner

@Kamryx commented on GitHub (Dec 10, 2024):

Hey everyone, just wanted to check in again, how are we looking on this now, both present and near future? Again my understanding is unfortunately pretty limited but from what I’ve gathered Arc support was here and then got removed in the .4 update?

I’ve seen there’s another fork that aims to be a comprehensive and easy to install Arc focused Ollama instance, but it’d be really nice to just rely on the main Ollama project and not have to juggle or flip between different Ollama builds on my system especially if I change GPU vendors. I don’t actually even know if the aforementioned fork is working right now either.

But I’m sure most of us are aware of the new Battlemage GPUs and… yeah, they’re yet again even more compelling than Arc was before. 16GB A770s are $230 right now too with memory bandwidth that beats most of NVIDIA 40 series. So I'm pretty antsy. Could use llama.cpp (I think?) but the Ollama ecosystem is so awesome and would love to stick with it

<!-- gh-comment-id:2529970982 --> @Kamryx commented on GitHub (Dec 10, 2024): Hey everyone, just wanted to check in again, how are we looking on this now, both present and near future? Again my understanding is unfortunately pretty limited but from what I’ve gathered Arc support was here and then got removed in the .4 update? I’ve seen there’s another fork that aims to be a comprehensive and easy to install Arc focused Ollama instance, but it’d be really nice to just rely on the main Ollama project and not have to juggle or flip between different Ollama builds on my system especially if I change GPU vendors. I don’t actually even know if the aforementioned fork is working right now either. But I’m sure most of us are aware of the new Battlemage GPUs and… yeah, they’re yet again even more compelling than Arc was before. 16GB A770s are $230 right now too with memory bandwidth that beats most of NVIDIA 40 series. So I'm pretty antsy. Could use llama.cpp (I think?) but the Ollama ecosystem is so awesome and would love to stick with it
Author
Owner

@Leo512bit commented on GitHub (Dec 10, 2024):

I’ve seen there’s another fork that aims to be a comprehensive and easy to
install Arc focused Ollama

Can you link to the fork? I'd like to take a look at it. Thanks.

On Mon, Dec 9, 2024, 5:22 PM Kamryx @.***> wrote:

Hey everyone, just wanted to check in again, how are we looking on this
now, both present and near future? Again my understanding is unfortunately
pretty limited but from what I’ve gathered Arc support was here and then
got removed in the .4 update?

I’ve seen there’s another fork that aims to be a comprehensive and easy to
install Arc focused Ollama instance, but it’d be really nice to just rely
on the main Ollama project and not have to juggle or flip between different
Ollama builds on my system especially if I change GPU vendors. I don’t
actually even know if the aforementioned fork is working right now either.

But I’m sure most of us are aware of the new Battlemage GPUs and… yeah,
they’re yet again even more compelling than Arc was before. 16GB A770s are
$230 right now too with memory bandwidth that beats most of NVIDIA 40
series. So I'm pretty antsy. Could use llama.cpp (I think?) but the Ollama
ecosystem is so awesome and would love to stick with it


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/1590#issuecomment-2529970982,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/APLTL6BBEQGNLHIR35W5LPT2EY665AVCNFSM6AAAAABA2I642CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRZHE3TAOJYGI
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:2530059717 --> @Leo512bit commented on GitHub (Dec 10, 2024): >I’ve seen there’s another fork that aims to be a comprehensive and easy to install Arc focused Ollama Can you link to the fork? I'd like to take a look at it. Thanks. On Mon, Dec 9, 2024, 5:22 PM Kamryx ***@***.***> wrote: > Hey everyone, just wanted to check in again, how are we looking on this > now, both present and near future? Again my understanding is unfortunately > pretty limited but from what I’ve gathered Arc support was here and then > got removed in the .4 update? > > I’ve seen there’s another fork that aims to be a comprehensive and easy to > install Arc focused Ollama instance, but it’d be really nice to just rely > on the main Ollama project and not have to juggle or flip between different > Ollama builds on my system especially if I change GPU vendors. I don’t > actually even know if the aforementioned fork is working right now either. > > But I’m sure most of us are aware of the new Battlemage GPUs and… yeah, > they’re yet again even more compelling than Arc was before. 16GB A770s are > $230 right now too with memory bandwidth that beats most of NVIDIA 40 > series. So I'm pretty antsy. Could use llama.cpp (I think?) but the Ollama > ecosystem is so awesome and would love to stick with it > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/1590#issuecomment-2529970982>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/APLTL6BBEQGNLHIR35W5LPT2EY665AVCNFSM6AAAAABA2I642CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRZHE3TAOJYGI> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@Kamryx commented on GitHub (Dec 10, 2024):

Yeah this here. It claims to require Ubuntu too, I’m running a different Linux flavor. That may just be a recommendation, idk.

https://github.com/mattcurf/ollama-intel-gpu

@Leo512bit

edit: looking again it seems to just specify Ubuntu for Arc kernel support so maybe that’s not a problem

<!-- gh-comment-id:2530116827 --> @Kamryx commented on GitHub (Dec 10, 2024): Yeah this here. It claims to require Ubuntu too, I’m running a different Linux flavor. That may just be a recommendation, idk. https://github.com/mattcurf/ollama-intel-gpu @Leo512bit edit: looking again it seems to just specify Ubuntu for Arc kernel support so maybe that’s not a problem
Author
Owner

@DocMAX commented on GitHub (Dec 15, 2024):

time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=GPU-c54579ac-8f8e-f3a7-141b-fea32408ffd0 library=cuda variant=v12 compute=8.6 driver=12.7 name="NVIDIA GeForce RTX 3080" total="9.7 GiB" available="7.9 GiB"
time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=0 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB"
time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=1 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Arc(TM) A310 LP Graphics" total="4.0 GiB" available="3.8 GiB"

This is what i get with Ollama 0.5.1. Does it mean they are supported? When running a model they are not used, only NVIDIA.

<!-- gh-comment-id:2543997146 --> @DocMAX commented on GitHub (Dec 15, 2024): ``` time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=GPU-c54579ac-8f8e-f3a7-141b-fea32408ffd0 library=cuda variant=v12 compute=8.6 driver=12.7 name="NVIDIA GeForce RTX 3080" total="9.7 GiB" available="7.9 GiB" time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=0 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB" time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=1 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Arc(TM) A310 LP Graphics" total="4.0 GiB" available="3.8 GiB" ``` This is what i get with Ollama 0.5.1. Does it mean they are supported? When running a model they are not used, only NVIDIA.
Author
Owner

@pauleseifert commented on GitHub (Dec 16, 2024):

Hey, @marcin-kruszynski Thank you for the response. Yes, I'm aware about OLLAMA_NUM_GPU setting. Tried different values, but OLLAMA_NUM_GPU=1 is only value when I managed to get stable performance. OLLAMA_NUM_GPU=2 works ok, but crashes sometimes. OLLAMA_NUM_GPU=999 crashes every time even on small models that should fit in VRAM. IDK, maybe it's somehow specific to my configuration. You totally right about older Ollama version used in IPEX image. It can't run llama3.2. I'm really looking forward to see #5059 working in the future releases, because as I understand the authors of Ollama aren't planning to add support of SYCL, One API or anything like that.

That's the same problem I have @peremenov . Haven't figured out the cause but opened an issue at intel-analytics/ipex-llm#12513. Lower OLLAMA_NUM_GPU values than layers of the chosen model unfortunately mean that interference is offloaded to the CPU.

time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=GPU-c54579ac-8f8e-f3a7-141b-fea32408ffd0 library=cuda variant=v12 compute=8.6 driver=12.7 name="NVIDIA GeForce RTX 3080" total="9.7 GiB" available="7.9 GiB"
time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=0 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB"
time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=1 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Arc(TM) A310 LP Graphics" total="4.0 GiB" available="3.8 GiB"

This is what i get with Ollama 0.5.1. Does it mean they are supported? When running a model they are not used, only NVIDIA.

@DocMAX What settings did you use?

<!-- gh-comment-id:2546768437 --> @pauleseifert commented on GitHub (Dec 16, 2024): > Hey, @marcin-kruszynski Thank you for the response. Yes, I'm aware about `OLLAMA_NUM_GPU` setting. Tried different values, but `OLLAMA_NUM_GPU=1` is only value when I managed to get stable performance. `OLLAMA_NUM_GPU=2` works ok, but crashes sometimes. `OLLAMA_NUM_GPU=999` crashes every time even on small models that should fit in VRAM. IDK, maybe it's somehow specific to my configuration. You totally right about older Ollama version used in IPEX image. It can't run llama3.2. I'm really looking forward to see #5059 working in the future releases, because as I understand the authors of Ollama aren't planning to add support of SYCL, One API or anything like that. That's the same problem I have @peremenov . Haven't figured out the cause but opened an issue at intel-analytics/ipex-llm#12513. Lower OLLAMA_NUM_GPU values than layers of the chosen model unfortunately mean that interference is offloaded to the CPU. > ``` > time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=GPU-c54579ac-8f8e-f3a7-141b-fea32408ffd0 library=cuda variant=v12 compute=8.6 driver=12.7 name="NVIDIA GeForce RTX 3080" total="9.7 GiB" available="7.9 GiB" > time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=0 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Arc(TM) A380 Graphics" total="5.9 GiB" available="5.6 GiB" > time=2024-12-15T18:40:08.077Z level=INFO source=types.go:131 msg="inference compute" id=1 library=oneapi variant="" compute="" driver=0.0 name="Intel(R) Arc(TM) A310 LP Graphics" total="4.0 GiB" available="3.8 GiB" > ``` > > This is what i get with Ollama 0.5.1. Does it mean they are supported? When running a model they are not used, only NVIDIA. @DocMAX What settings did you use?
Author
Owner

@DocMAX commented on GitHub (Dec 16, 2024):

No special settings. Just the ollama package from arch linux. Also installed the Intel oneapi libraries of course.

<!-- gh-comment-id:2546787115 --> @DocMAX commented on GitHub (Dec 16, 2024): No special settings. Just the ollama package from arch linux. Also installed the Intel oneapi libraries of course.
Author
Owner

@vladislavdonchev commented on GitHub (Dec 21, 2024):

Yeah this here. It claims to require Ubuntu too, I’m running a different Linux flavor. That may just be a recommendation, idk.

https://github.com/mattcurf/ollama-intel-gpu

@Leo512bit

edit: looking again it seems to just specify Ubuntu for Arc kernel support so maybe that’s not a problem

@Kamryx
Oh man, I'm losing my mind here... Which WSL kernel version did you manage to get this working with?

I tried a couple, and even though clinfo lists the A770 GPUs, dmesg shows errors trying to load the driver... Cards are working perfectly fine on the Windows host.

I even listed an issue on ipex-llm:
https://github.com/intel-analytics/ipex-llm/issues/12592

<!-- gh-comment-id:2558144164 --> @vladislavdonchev commented on GitHub (Dec 21, 2024): > Yeah this here. It claims to require Ubuntu too, I’m running a different Linux flavor. That may just be a recommendation, idk. > > https://github.com/mattcurf/ollama-intel-gpu > > @Leo512bit > > edit: looking again it seems to just specify Ubuntu for Arc kernel support so maybe that’s not a problem @Kamryx Oh man, I'm losing my mind here... Which WSL kernel version did you manage to get this working with? I tried a couple, and even though clinfo lists the A770 GPUs, dmesg shows errors trying to load the driver... Cards are working perfectly fine on the Windows host. I even listed an issue on ipex-llm: https://github.com/intel-analytics/ipex-llm/issues/12592
Author
Owner

@DocMAX commented on GitHub (Jan 5, 2025):

ok from what i understand i can only use the IPEX "bundled" ollama with Intel Arc cards. It worked with the right libraries installed (arch linux). but from what i understand is i CAN'T run multiple gpu brands with intel at the moment, right? we still need so official "ipex-runner".

<!-- gh-comment-id:2571493231 --> @DocMAX commented on GitHub (Jan 5, 2025): ok from what i understand i can only use the IPEX "bundled" ollama with Intel Arc cards. It worked with the right libraries installed (arch linux). but from what i understand is i CAN'T run multiple gpu brands with intel at the moment, right? we still need so official "ipex-runner".
Author
Owner

@uxdesignerhector commented on GitHub (Jan 23, 2025):

Yeah this here. It claims to require Ubuntu too, I’m running a different Linux flavor. That may just be a recommendation, idk.

https://github.com/mattcurf/ollama-intel-gpu

@Leo512bit

edit: looking again it seems to just specify Ubuntu for Arc kernel support so maybe that’s not a problem

I was able to run this using WSL2, that means full Windows compatibility. I just had to disable my integrated GPU from Windows device manager otherwise I would encounter next error: Error: llama runner process has terminated: exit status 2 when running ./ollama run qwen2.5-coder.

Could be the Arc 770 the cheapest AI card on the market right now? This thing is very fast, it is ageing like fine wine

<!-- gh-comment-id:2611202968 --> @uxdesignerhector commented on GitHub (Jan 23, 2025): > Yeah this here. It claims to require Ubuntu too, I’m running a different Linux flavor. That may just be a recommendation, idk. > > https://github.com/mattcurf/ollama-intel-gpu > > [@Leo512bit](https://github.com/Leo512bit) > > edit: looking again it seems to just specify Ubuntu for Arc kernel support so maybe that’s not a problem I was able to run this using WSL2, that means full Windows compatibility. I just had to disable my integrated GPU from Windows device manager otherwise I would encounter next error: `Error: llama runner process has terminated: exit status 2` when running `./ollama run qwen2.5-coder`. Could be the Arc 770 the cheapest AI card on the market right now? This thing is very fast, it is ageing like fine wine
Author
Owner

@wbste commented on GitHub (Jan 24, 2025):

Yeah this here. It claims to require Ubuntu too, I’m running a different Linux flavor. That may just be a recommendation, idk.
https://github.com/mattcurf/ollama-intel-gpu
@Leo512bit
edit: looking again it seems to just specify Ubuntu for Arc kernel support so maybe that’s not a problem

I was able to run this using WSL2, that means full Windows compatibility. I just had to disable my integrated GPU from Windows device manager otherwise I would encounter next error: Error: llama runner process has terminated: exit status 2 when running ./ollama run qwen2.5-coder.

Could be the Arc 770 the cheapest AI card on the market right now? This thing is very fast, it is ageing like fine wine

It takes a few steps to setup but ipex version of ollama has been impressive. https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

<!-- gh-comment-id:2611334259 --> @wbste commented on GitHub (Jan 24, 2025): > > Yeah this here. It claims to require Ubuntu too, I’m running a different Linux flavor. That may just be a recommendation, idk. > > https://github.com/mattcurf/ollama-intel-gpu > > [@Leo512bit](https://github.com/Leo512bit) > > edit: looking again it seems to just specify Ubuntu for Arc kernel support so maybe that’s not a problem > > I was able to run this using WSL2, that means full Windows compatibility. I just had to disable my integrated GPU from Windows device manager otherwise I would encounter next error: `Error: llama runner process has terminated: exit status 2` when running `./ollama run qwen2.5-coder`. > > Could be the Arc 770 the cheapest AI card on the market right now? This thing is very fast, it is ageing like fine wine It takes a few steps to setup but ipex version of ollama has been impressive. https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md
Author
Owner

@charlescng commented on GitHub (Jan 24, 2025):

I can run the image from https://github.com/mattcurf/ollama-intel-gpu stable on unRAID 7.0.0 (kernel 6.6 with i915 driver) with an Arc A380 with the following environment variables passed in:

  • DEVICE=Arc
  • OLLAMA_MAX_LOADED_MODELS=1
  • OLLAMA_NUM_PARALLEL=1
  • OLLAMA_NUM_GPU=1
  • SYCL_CACHE_PERSITENT=1

The Arc A380 only has 6 GB of VRAM but the llama3.1:8b models runs on it. I get ~12 response tokens per second with that model. Around 60 tokens per second with llama3.2:1b.

<!-- gh-comment-id:2611403512 --> @charlescng commented on GitHub (Jan 24, 2025): I can run the image from https://github.com/mattcurf/ollama-intel-gpu stable on unRAID 7.0.0 (kernel 6.6 with i915 driver) with an Arc A380 with the following environment variables passed in: * DEVICE=Arc * OLLAMA_MAX_LOADED_MODELS=1 * OLLAMA_NUM_PARALLEL=1 * OLLAMA_NUM_GPU=1 * SYCL_CACHE_PERSITENT=1 The Arc A380 only has 6 GB of VRAM but the llama3.1:8b models runs on it. I get ~12 response tokens per second with that model. Around 60 tokens per second with llama3.2:1b.
Author
Owner

@baoduy commented on GitHub (Feb 24, 2025):

Looking forward to support Intel Arc GPU natively soon
Currently, I'm using the workaround here https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md but still refer the native support from ollama team

<!-- gh-comment-id:2677937773 --> @baoduy commented on GitHub (Feb 24, 2025): Looking forward to support Intel Arc GPU natively soon Currently, I'm using the workaround here https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md but still refer the native support from ollama team
Author
Owner

@huichuno commented on GitHub (May 10, 2025):

PyTorch 2.7 deliver significant functionality and performance enhancements on Intel GPU architectures to streamline AI workflows: https://pytorch.org/blog/pytorch-2-7-intel-gpus/

<!-- gh-comment-id:2868300647 --> @huichuno commented on GitHub (May 10, 2025): PyTorch 2.7 deliver significant functionality and performance enhancements on Intel GPU architectures to streamline AI workflows: https://pytorch.org/blog/pytorch-2-7-intel-gpus/
Author
Owner
<!-- gh-comment-id:3068741059 --> @MaoJianwei commented on GitHub (Jul 14, 2025): [Can ollama use Intel integrated GPU to speed up inference?e.g. Intel UHD Graphics 630 of i5-10400](https://github.com/ollama/ollama/issues/11411)
Author
Owner

@ericcurtin commented on GitHub (Oct 13, 2025):

We added Vulkan support to docker model runner, so we cover this feature:

https://www.docker.com/blog/docker-model-runner-vulkan-gpu-support/

We've also put effort to putting all our code in one central place to make it easier for people to contribute. Please star, fork and contribute.

https://github.com/docker/model-runner

We have vulkan support. You can pull models from Docker Hub, Huggingface or any other OCI registry. You can also push models to Docker Hub or any other OCI registry.

<!-- gh-comment-id:3399440784 --> @ericcurtin commented on GitHub (Oct 13, 2025): We added Vulkan support to docker model runner, so we cover this feature: https://www.docker.com/blog/docker-model-runner-vulkan-gpu-support/ We've also put effort to putting all our code in one central place to make it easier for people to contribute. Please star, fork and contribute. https://github.com/docker/model-runner We have vulkan support. You can pull models from Docker Hub, Huggingface or any other OCI registry. You can also push models to Docker Hub or any other OCI registry.
Author
Owner

@Xyz00777 commented on GitHub (Nov 9, 2025):

heyho what is the state of developement? its sadly now open since nearly 2 years :(

<!-- gh-comment-id:3508800455 --> @Xyz00777 commented on GitHub (Nov 9, 2025): heyho what is the state of developement? its sadly now open since nearly 2 years :(
Author
Owner

@MaoJianwei commented on GitHub (Nov 10, 2025):

heyho what is the state of developement? its sadly now open since nearly 2 years :(

https://github.com/ggml-org/llama.cpp/issues/1956

<!-- gh-comment-id:3509074384 --> @MaoJianwei commented on GitHub (Nov 10, 2025): > heyho what is the state of developement? its sadly now open since nearly 2 years :( https://github.com/ggml-org/llama.cpp/issues/1956
Author
Owner

@Xyz00777 commented on GitHub (Nov 12, 2025):

its a workaround to use llama.cpp instead of ollama but not a solution as well as the experimental vulkan support... :/
at least as far as i understood ollama is not on the current state of llama.cpp(?) and based on that its not working in ollama

<!-- gh-comment-id:3521088821 --> @Xyz00777 commented on GitHub (Nov 12, 2025): its a workaround to use llama.cpp instead of ollama but not a solution as well as the experimental vulkan support... :/ at least as far as i understood ollama is not on the current state of llama.cpp(?) and based on that its not working in ollama
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26640