[GH-ISSUE #6494] igpu #66125

Closed
opened 2026-05-04 00:07:48 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @ayttop on GitHub (Aug 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6494

ollama with igpu intel
how to run ollama on igpu intel

Originally created by @ayttop on GitHub (Aug 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6494 ollama with igpu intel how to run ollama on igpu intel
GiteaMirror added the feature request label 2026-05-04 00:07:48 -05:00
Author
Owner

@hxhue commented on GitHub (Aug 26, 2024):

There is a related PR: https://github.com/ollama/ollama/pull/5593

The contributor released an experimental package last month. After installing Intel's oneAPI package and setting the environment variables according to the PR, I managed to utilize my Intel Core Ultra 9 185H to run models. My machine is a Windows laptop.

The speed is only slightly faster than the CPU inference mode, but I don't get 100% CPU usage and the fan has stopped whining crazily.

<!-- gh-comment-id:2309342954 --> @hxhue commented on GitHub (Aug 26, 2024): There is a related PR: https://github.com/ollama/ollama/pull/5593 The contributor released [an experimental package](https://github.com/zhewang1-intc/ollama/releases) last month. After installing Intel's oneAPI package and setting the environment variables according to the PR, I managed to utilize my Intel Core Ultra 9 185H to run models. My machine is a Windows laptop. The speed is only slightly faster than the CPU inference mode, but I don't get 100% CPU usage and the fan has stopped whining crazily.
Author
Owner

@byjrack commented on GitHub (Aug 26, 2024):

And same boat, but using an older Raptor Lake/13th gen Xe-LP chip. Hoping to see the work in the fork merged to an official build sooner rather than later. Performance is not awesome, but not sure what they can squeeze out of oneAPI supported chips and avoids CPU contention during use that makes the machine unresponsive.

<!-- gh-comment-id:2310071825 --> @byjrack commented on GitHub (Aug 26, 2024): And same boat, but using an older Raptor Lake/13th gen Xe-LP chip. Hoping to see the work in the fork merged to an official build sooner rather than later. Performance is not awesome, but not sure what they can squeeze out of oneAPI supported chips and avoids CPU contention during use that makes the machine unresponsive.
Author
Owner

@hxhue commented on GitHub (Aug 26, 2024):

@byjrack The iGPU doesn't rely on the CPU to compute. They are two different parts, so the machine is still responsive. 😂

<!-- gh-comment-id:2310176122 --> @hxhue commented on GitHub (Aug 26, 2024): @byjrack The iGPU doesn't rely on the CPU to compute. They are two different parts, so the machine is still responsive. 😂
Author
Owner

@byjrack commented on GitHub (Aug 26, 2024):

Yup well aware of that and one of the benefits for using it vs AVX2. I will say fans still go to 11 as both burn a ton of electrons to make things go compared to Metal on aarch as an example, but will take what i can get.

<!-- gh-comment-id:2310181947 --> @byjrack commented on GitHub (Aug 26, 2024): Yup well aware of that and one of the benefits for using it vs AVX2. I will say fans still go to 11 as both burn a ton of electrons to make things go compared to Metal on aarch as an example, but will take what i can get.
Author
Owner

@dhiltgen commented on GitHub (Aug 27, 2024):

Let's track this via #3113

<!-- gh-comment-id:2313601802 --> @dhiltgen commented on GitHub (Aug 27, 2024): Let's track this via #3113
Author
Owner
<!-- gh-comment-id:2314068516 --> @ayttop commented on GitHub (Aug 28, 2024): it run with https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66125