[GH-ISSUE #8777] Adding support for Intel GPUs #5699

Closed
opened 2026-04-12 16:59:33 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @Maksim3l on GitHub (Feb 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8777

I have trouble running deepseek on my laptop that has a GPU with with 8GB of VRAM, granted this is not a lot however I want to use it for a personal project and to rend API request to it for generating content and for that purpose and the size of the 8b model it should be enough; however I noticed that it's eating away at my CPU crippling my laptops performance while I'm trying to debug.

I noticed that Ollama seems to have no support to easily switch between GPU usages and doesn't even have native support for Intel GPUs which even tho my machine should be able to run this model whilst I work.

I understand that I could solve this problem by throwing money at it, however I currently have no such spare funds where I could build an entire dedicated rig for this.

GPU: Intel(R) Iris(R) Xe Graphics

PS: Yes I have looked at ipex-llm, no I don't understand the instructions plus it doesn't seem to add support for Iris...

Originally created by @Maksim3l on GitHub (Feb 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8777 I have trouble running deepseek on my laptop that has a GPU with with 8GB of VRAM, granted this is not a lot however I want to use it for a personal project and to rend API request to it for generating content and for that purpose and the size of the 8b model it should be enough; however I noticed that it's eating away at my CPU crippling my laptops performance while I'm trying to debug. I noticed that Ollama seems to have no support to easily switch between GPU usages and doesn't even have native support for Intel GPUs which even tho my machine should be able to run this model whilst I work. I understand that I could solve this problem by throwing money at it, however I currently have no such spare funds where I could build an entire dedicated rig for this. GPU: Intel(R) Iris(R) Xe Graphics _PS: Yes I have looked at ipex-llm, no I don't understand the instructions plus it doesn't seem to add support for Iris..._
GiteaMirror added the feature request label 2026-04-12 16:59:33 -05:00
Author
Owner

@NeoZhangJianyu commented on GitHub (Feb 5, 2025):

There is same issue to support Intel GPU: https://github.com/ollama/ollama/issues/8414.
I plan to make ollama support Intel GPU locally.
If your Intel GPU is supported by llama.cpp, it would be supported by ollama too.

<!-- gh-comment-id:2635545237 --> @NeoZhangJianyu commented on GitHub (Feb 5, 2025): There is same issue to support Intel GPU: https://github.com/ollama/ollama/issues/8414. I plan to make ollama support Intel GPU locally. If your Intel GPU is supported by llama.cpp, it would be supported by ollama too.
Author
Owner

@n3thshan commented on GitHub (Feb 16, 2025):

@NeoZhangJianyu im not a developer but just want to use local llms for privacy. i have some questions. i have built in arc gpu (MTL) and i know llama.cpp supports it. However, as per my understanding, there is some manual compiling required to get intel gpus to be hardware accelerated when running models right? or does ollama support gpu acceleration of my exact model out of the box? TIA!

<!-- gh-comment-id:2661375647 --> @n3thshan commented on GitHub (Feb 16, 2025): @NeoZhangJianyu im not a developer but just want to use local llms for privacy. i have some questions. i have built in arc gpu (MTL) and i know llama.cpp supports it. However, as per my understanding, there is some manual compiling required to get intel gpus to be hardware accelerated when running models right? or does ollama support gpu acceleration of my exact model out of the box? TIA!
Author
Owner

@NeoZhangJianyu commented on GitHub (Feb 17, 2025):

After ollama is restored to support llama.cpp SYCL backend, ollama will support MTL out of the box.
But now, it doesn't.

You could download and try llama.cpp windows binary directly now, if don't want to compile code.
Note, you need to install the GPU driver manually before run it.

<!-- gh-comment-id:2662349992 --> @NeoZhangJianyu commented on GitHub (Feb 17, 2025): After ollama is restored to support llama.cpp SYCL backend, ollama will support MTL out of the box. But now, it doesn't. You could download and try llama.cpp windows binary directly now, if don't want to compile code. Note, you need to install the GPU driver manually before run it.
Author
Owner

@n3thshan commented on GitHub (Feb 17, 2025):

Note, you need to install the GPU driver manually before run it.

@NeoZhangJianyu im on linux. do i have to install a different graphics driver as well or the one used by the kernel (6.12.9)?

After ollama is restored to support llama.cpp SYCL backend, ollama will support MTL out of the box

also is there an ongoing PR here that u can link so I can stay updated on this?

<!-- gh-comment-id:2663061629 --> @n3thshan commented on GitHub (Feb 17, 2025): > Note, you need to install the GPU driver manually before run it. @NeoZhangJianyu im on linux. do i have to install a different graphics driver as well or the one used by the kernel (6.12.9)? > After ollama is restored to support llama.cpp SYCL backend, ollama will support MTL out of the box also is there an ongoing PR here that u can link so I can stay updated on this?
Author
Owner

@NeoZhangJianyu commented on GitHub (Feb 19, 2025):

Yes, you need to install the Intel GPU driver manually, refer to https://dgpu-docs.intel.com/driver/client/overview.html.
It makes sure the driver is latest.

There is no PR for it.
I'm waiting for Ollama finish the engine refactor. Then the work will begin.

<!-- gh-comment-id:2667275612 --> @NeoZhangJianyu commented on GitHub (Feb 19, 2025): Yes, you need to install the Intel GPU driver manually, refer to https://dgpu-docs.intel.com/driver/client/overview.html. It makes sure the driver is latest. There is no PR for it. I'm waiting for Ollama finish the engine refactor. Then the work will begin.
Author
Owner

@n3thshan commented on GitHub (Feb 19, 2025):

great! thanks for ur contributions @NeoZhangJianyu! also, currently for igpu acceleration im using another software called ramalama which has support for it via SYCL. however i get an error saying Loading modelget_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory Even if I export the variable, it still shows up. do u know as to why?

<!-- gh-comment-id:2667372103 --> @n3thshan commented on GitHub (Feb 19, 2025): great! thanks for ur contributions @NeoZhangJianyu! also, currently for igpu acceleration im using another software called [ramalama](https://github.com/containers/ramalama) which has support for it via SYCL. however i get an error saying ```Loading modelget_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory``` Even if I export the variable, it still shows up. do u know as to why?
Author
Owner

@NeoZhangJianyu commented on GitHub (Feb 20, 2025):

I don't know this software.
But looks like the log info is drafted by me for llama.cpp. :)

That log means the iGPU is not supported by this API to detect the free memory of GPU. It will use the total memory as free memory.

If you only run 1 instance on iGPU, there is no negative issue. Ignore it.
If you run more than 1 instance, that will make the memory malloc error if all requirements are more than physical memory.

<!-- gh-comment-id:2670140775 --> @NeoZhangJianyu commented on GitHub (Feb 20, 2025): I don't know this software. But looks like the log info is drafted by me for llama.cpp. :) That log means the iGPU is not supported by this API to detect the free memory of GPU. It will use the total memory as free memory. If you only run 1 instance on iGPU, there is no negative issue. Ignore it. If you run more than 1 instance, that will make the memory malloc error if all requirements are more than physical memory.
Author
Owner

@n3thshan commented on GitHub (Feb 20, 2025):

@NeoZhangJianyu im wishing the best for u for ur contributions to make intel igpu work on ollama! thanks for ur replies!

<!-- gh-comment-id:2670414408 --> @n3thshan commented on GitHub (Feb 20, 2025): @NeoZhangJianyu im wishing the best for u for ur contributions to make intel igpu work on ollama! thanks for ur replies!
Author
Owner

@AlanyTan commented on GitHub (May 8, 2025):

The issue is closed, but it does not seem like the Intel GPU is natively supported by Ollama yet?
What is the status of this Feature?

<!-- gh-comment-id:2862868338 --> @AlanyTan commented on GitHub (May 8, 2025): The issue is closed, but it does not seem like the Intel GPU is natively supported by Ollama yet? What is the status of this Feature?
Author
Owner

@NeoZhangJianyu commented on GitHub (May 25, 2025):

I have implemented.
Please refer to: https://github.com/ollama/ollama/issues/8414

<!-- gh-comment-id:2907636394 --> @NeoZhangJianyu commented on GitHub (May 25, 2025): I have implemented. Please refer to: https://github.com/ollama/ollama/issues/8414
Author
Owner

@technout commented on GitHub (Jul 6, 2025):

Because it looks like official Ollama repo is not going to implement it anytime soon, please look at this blog post:
"Optimized Local AI on Intel Arc B580 with OpenWebUI and Ollama, using Fedora Linux and Podman"
https://syslynx.net/llm-intel-b580-linux
Works very well for me on my Manjaro system with AMD Ryzen and Intel Arc A380 GPU!

<!-- gh-comment-id:3042289277 --> @technout commented on GitHub (Jul 6, 2025): Because it looks like official Ollama repo is not going to implement it anytime soon, please look at this blog post: "Optimized Local AI on Intel Arc B580 with OpenWebUI and Ollama, using Fedora Linux and Podman" https://syslynx.net/llm-intel-b580-linux Works very well for me on my Manjaro system with AMD Ryzen and Intel Arc A380 GPU!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5699