[GH-ISSUE #2169] Inference with OpenVINO on Intel #47753

New Issue

GiteaMirror · 2026-04-28T05:10:23-05:00

GiteaMirror commented

2026-04-28 05:10:23 -05:00

Originally created by @ddpasa on GitHub (Jan 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2169

I think Intel CPUs/GPUs now support more efficient inference with OpenVINO. See example here with LLAVA: https://docs.openvino.ai/2023.2/notebooks/257-llava-multimodal-chatbot-with-output.html

It would be great if ollama could automatically default to OpenVINO on Intel systems.

Originally created by @ddpasa on GitHub (Jan 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2169 I think Intel CPUs/GPUs now support more efficient inference with OpenVINO. See example here with LLAVA: https://docs.openvino.ai/2023.2/notebooks/257-llava-multimodal-chatbot-with-output.html It would be great if ollama could automatically default to OpenVINO on Intel systems.

GiteaMirror added the feature request label 2026-04-28 05:10:23 -05:00

GiteaMirror commented

2026-04-28 05:11:04 -05:00

@Kreijstal commented on GitHub (Jul 17, 2024):

this would be great, there is no need for CUDA by default (if you dont own the hardware), but your own hardware by default...

@Kreijstal commented on GitHub (Jul 17, 2024): this would be great, there is no need for CUDA by default (if you dont own the hardware), but your own hardware by default...

GiteaMirror commented

2026-04-28 05:11:07 -05:00

@Kreijstal commented on GitHub (Jul 17, 2024):

so either built for all systems and decide at runtime which to use, or there are two different packages ollama-cuda and ollama-openvino, right?

@Kreijstal commented on GitHub (Jul 17, 2024): so either built for all systems and decide at runtime which to use, or there are two different packages ollama-cuda and ollama-openvino, right?

GiteaMirror commented

2026-04-28 05:11:10 -05:00

@Kreijstal commented on GitHub (Jul 17, 2024):

I mean currently, it simply installs cuda things my hardware doesn't even support, it should at least ask, if I have cuda!

@Kreijstal commented on GitHub (Jul 17, 2024): I mean currently, it simply installs cuda things my hardware doesn't even support, it should at least ask, if I have cuda!

GiteaMirror commented

2026-04-28 05:11:12 -05:00

@ddpasa commented on GitHub (Jul 18, 2024):

There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working.

@ddpasa commented on GitHub (Jul 18, 2024): There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working.

GiteaMirror commented

2026-04-28 05:11:15 -05:00

@enricorampazzo commented on GitHub (Jul 22, 2024):

There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working.

OpenVINO also supports the NPU, which I think would be useful: I run a simple test last night, asking llama3-8B-Instruct "how are you", the CPU answered in 64 minutes, the npu did the same in 55 seconds :)

@enricorampazzo commented on GitHub (Jul 22, 2024): > There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working. OpenVINO also supports the NPU, which I think would be useful: I run a simple test last night, asking llama3-8B-Instruct "how are you", the CPU answered in 64 minutes, the npu did the same in 55 seconds :)

GiteaMirror commented

2026-04-28 05:11:19 -05:00

@Kreijstal commented on GitHub (Jul 22, 2024):

There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working.

OpenVINO also supports the NPU, which I think would be useful: I run a simple test last night, asking llama3-8B-Instruct "how are you", the CPU answered in 64 minutes, the npu did the same in 55 seconds :)

h-how do you compile this on your machine?

@Kreijstal commented on GitHub (Jul 22, 2024): > > There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working. > > OpenVINO also supports the NPU, which I think would be useful: I run a simple test last night, asking llama3-8B-Instruct "how are you", the CPU answered in 64 minutes, the npu did the same in 55 seconds :) h-how do you compile this on your machine?

GiteaMirror commented

2026-04-28 05:11:22 -05:00

@luhuaei commented on GitHub (Jul 22, 2024):

There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working.

OpenVINO also supports the NPU, which I think would be useful: I run a simple test last night, asking llama3-8B-Instruct "how are you", the CPU answered in 64 minutes, the npu did the same in 55 seconds :)

How did you manage to do it? According to the documentation, the current OpenVINO NPU only supports models with static shapes. However, as far as I know, Llama3 is a model for causality, so OpenVINO NPU should not be able to run Llama3. Thank you for your guidance.

@luhuaei commented on GitHub (Jul 22, 2024): > > There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working. > > OpenVINO also supports the NPU, which I think would be useful: I run a simple test last night, asking llama3-8B-Instruct "how are you", the CPU answered in 64 minutes, the npu did the same in 55 seconds :) How did you manage to do it? According to the [documentation](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#limitations), the current OpenVINO NPU only supports models with static shapes. However, as far as I know, Llama3 is a model for causality, so OpenVINO NPU should not be able to run Llama3. Thank you for your guidance.

GiteaMirror commented

2026-04-28 05:11:28 -05:00

@enricorampazzo commented on GitHub (Jul 22, 2024):

I

There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working.

OpenVINO also supports the NPU, which I think would be useful: I run a simple test last night, asking llama3-8B-Instruct "how are you", the CPU answered in 64 minutes, the npu did the same in 55 seconds :)

How did you manage to do it? According to the documentation, the current OpenVINO NPU only supports models with static shapes. However, as far as I know, Llama3 is a model for causality, so OpenVINO NPU should not be able to run Llama3. Thank you for your guidance.

So, I am not an expert in any way shape or form: I bought a laptop with an ultra 7 processor (12th gen Thinkpad Carbon X1) on Friday, Saturday I found out about it having an NPU and today I am running some simple benchmarks, so take what follows with a lot of salt.

I have found the repo for the Intel NPU library. It comes with several examples, including one that runs Llama3 on the NPU. I had to solve/work around some issues that I have since opened (see here and here), but overall it works, and the difference between CPU and NPU time is quite astounding.
I know that is running on the NPU by looking to the performance tab in task manager, which has a separate row for the NPU

As I said I am running benchmarks and I will keep you posted if you are interested

@enricorampazzo commented on GitHub (Jul 22, 2024): I > > > There will soon be a vulkan backend to ollama. I'm nor sure if openvino is still needed when that starts working. > > > > > > OpenVINO also supports the NPU, which I think would be useful: I run a simple test last night, asking llama3-8B-Instruct "how are you", the CPU answered in 64 minutes, the npu did the same in 55 seconds :) > > How did you manage to do it? According to the [documentation](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#limitations), the current OpenVINO NPU only supports models with static shapes. However, as far as I know, Llama3 is a model for causality, so OpenVINO NPU should not be able to run Llama3. Thank you for your guidance. So, I am not an expert in any way shape or form: I bought a laptop with an ultra 7 processor (12th gen Thinkpad Carbon X1) on Friday, Saturday I found out about it having an NPU and today I am running some simple benchmarks, so take what follows with a lot of salt. I have found the repo for the [Intel NPU library](https://github.com/intel/intel-npu-acceleration-library). It comes with several examples, including one that [runs Llama3 on the NPU](https://github.com/intel/intel-npu-acceleration-library/blob/main/examples/llama3.py). I had to solve/work around some issues that I have since opened (see [here](https://github.com/intel/intel-npu-acceleration-library/issues/101) and [here](https://github.com/intel/intel-npu-acceleration-library/issues/102)), but overall it works, and the difference between CPU and NPU time is quite astounding. I know that is running on the NPU by looking to the performance tab in task manager, which has a separate row for the NPU As I said I am running benchmarks and I will keep you posted if you are interested

GiteaMirror commented

2026-04-28 05:11:33 -05:00

@luhuaei commented on GitHub (Jul 23, 2024):

@Giustiniano Thank you for your explanation. I'll give it a try. Thank you.

@luhuaei commented on GitHub (Jul 23, 2024): @Giustiniano Thank you for your explanation. I'll give it a try. Thank you.

GiteaMirror commented

2026-04-28 05:11:34 -05:00

@enricorampazzo commented on GitHub (Jul 23, 2024):

Here you can see a demonstration of llama-3 running on the NPU
https://youtu.be/p6Ohv8JXJF8

@enricorampazzo commented on GitHub (Jul 23, 2024): Here you can see a demonstration of llama-3 running on the NPU https://youtu.be/p6Ohv8JXJF8

GiteaMirror commented

2026-04-28 05:11:35 -05:00

@taxmeifyoucan commented on GitHub (Jul 29, 2024):

+1 this would be very helpful

@taxmeifyoucan commented on GitHub (Jul 29, 2024): +1 this would be very helpful

GiteaMirror commented

2026-04-28 05:11:35 -05:00

@alexma233 commented on GitHub (Aug 8, 2024):

I was wondering if there have been any updates or progress on this issue. Is there a timeline for resolving this or any workaround available? Thank you!

@alexma233 commented on GitHub (Aug 8, 2024): I was wondering if there have been any updates or progress on this issue. Is there a timeline for resolving this or any workaround available? Thank you!

GiteaMirror commented

2026-04-28 05:11:36 -05:00

@divemasterjm commented on GitHub (Oct 1, 2024):

+1

@divemasterjm commented on GitHub (Oct 1, 2024): +1

GiteaMirror commented

2026-04-28 05:11:37 -05:00

@jsapede commented on GitHub (Oct 3, 2024):

+2

@jsapede commented on GitHub (Oct 3, 2024): +2

GiteaMirror commented

2026-04-28 05:11:37 -05:00

@johnmmcgee commented on GitHub (Oct 6, 2024):

Would be interested as well.

@johnmmcgee commented on GitHub (Oct 6, 2024): Would be interested as well.

GiteaMirror commented

2026-04-28 05:11:38 -05:00

@liyimeng commented on GitHub (Oct 28, 2024):

+3

@liyimeng commented on GitHub (Oct 28, 2024): +3

GiteaMirror commented

2026-04-28 05:11:39 -05:00

@ghchris2021 commented on GitHub (Nov 13, 2024):

it'd be nice to see!

@ghchris2021 commented on GitHub (Nov 13, 2024): it'd be nice to see!

GiteaMirror commented

2026-04-28 05:11:40 -05:00

@awaLiny2333 commented on GitHub (Feb 4, 2025):

+4

@awaLiny2333 commented on GitHub (Feb 4, 2025): +4

GiteaMirror commented

2026-04-28 05:11:41 -05:00

@thesolomon-tech commented on GitHub (Feb 13, 2025):

I think think it would be good to aggregate issue notifications.
OpenVINO seems to have Intel NPU support according to their Github Page.
There relevant issues are #5747 and #8281 #3004.

@thesolomon-tech commented on GitHub (Feb 13, 2025): I think think it would be good to aggregate issue notifications. OpenVINO seems to have Intel NPU support according to their [Github Page](https://github.com/openvinotoolkit/openvino). There relevant issues are #5747 and #8281 #3004.

GiteaMirror commented

2026-04-28 05:11:43 -05:00

@dmbuil commented on GitHub (Feb 21, 2025):

+5!

@dmbuil commented on GitHub (Feb 21, 2025): +5!

GiteaMirror commented

2026-04-28 05:11:43 -05:00

@zhaohb commented on GitHub (Mar 24, 2025):

Hi, all
We have implemented the integration of OpenVINO and Ollama: https://github.com/zhaohb/ollama_ov or https://github.com/openvinotoolkit/openvino_contrib/pull/953

@zhaohb commented on GitHub (Mar 24, 2025): Hi, all We have implemented the integration of OpenVINO and Ollama: https://github.com/zhaohb/ollama_ov or https://github.com/openvinotoolkit/openvino_contrib/pull/953

GiteaMirror commented

2026-04-28 05:11:44 -05:00

@Kreijstal commented on GitHub (Mar 24, 2025):

@zhaohb pr when

@Kreijstal commented on GitHub (Mar 24, 2025): @zhaohb pr when

GiteaMirror commented

2026-04-28 05:11:44 -05:00

@ddpasa commented on GitHub (Mar 25, 2025):

Hi, all We have implemented the integration of OpenVINO and Ollama: https://github.com/zhaohb/ollama_ov or openvinotoolkit/openvino_contrib#953

@zhaohb , do you plan to contribute this as a supported backend to llama.cpp?

@ddpasa commented on GitHub (Mar 25, 2025): > Hi, all We have implemented the integration of OpenVINO and Ollama: https://github.com/zhaohb/ollama_ov or [openvinotoolkit/openvino_contrib#953](https://github.com/openvinotoolkit/openvino_contrib/pull/953) @zhaohb , do you plan to contribute this as a supported backend to llama.cpp?

GiteaMirror commented

2026-04-28 05:11:44 -05:00

@zhaohb commented on GitHub (Mar 25, 2025):

Hi, all We have implemented the integration of OpenVINO and Ollama: https://github.com/zhaohb/ollama_ov or openvinotoolkit/openvino_contrib#953

@zhaohb , do you plan to contribute this as a supported backend to llama.cpp?

Hi, I guess this won't be the backend to llama.cpp, it's just the backend to Ollama.

@zhaohb commented on GitHub (Mar 25, 2025): > > Hi, all We have implemented the integration of OpenVINO and Ollama: https://github.com/zhaohb/ollama_ov or [openvinotoolkit/openvino_contrib#953](https://github.com/openvinotoolkit/openvino_contrib/pull/953) > > [@zhaohb](https://github.com/zhaohb) , do you plan to contribute this as a supported backend to llama.cpp? Hi, I guess this won't be the backend to llama.cpp, it's just the backend to Ollama.

GiteaMirror commented

2026-04-28 05:11:45 -05:00

@zhaohb commented on GitHub (Mar 26, 2025):

@zhaohb pr when
Yes, I want to upstream, but I don’t know if it will be merged.

@zhaohb commented on GitHub (Mar 26, 2025): > [@zhaohb](https://github.com/zhaohb) pr when Yes, I want to upstream, but I don’t know if it will be merged.

GiteaMirror commented

2026-04-28 05:11:45 -05:00

@FionaZZ92 commented on GitHub (Mar 26, 2025):

Ollama_OV is using OpenVINO GenAI as backend for inferencing on Intel platforms including CPU/GPU/NPU. And it can make sure no performance gap between OpenVINO published performance on Intel platforms. We appreciate the user friendly interface that Ollama provided to open community, so the goal is to make open community easier and quickly to use Ollama to get good performance which Intel been tracked and maintained. The OV backend of Ollama will be helpful to both Ollama ecosystem and OpenVINO ecosystem.

We are open and delight to upstream to Ollama as one of options. Look forward anyone can help to promote the progress.

@FionaZZ92 commented on GitHub (Mar 26, 2025): Ollama_OV is using OpenVINO GenAI as backend for inferencing on Intel platforms including CPU/GPU/NPU. And it can make sure no performance gap between OpenVINO published performance on Intel platforms. We appreciate the user friendly interface that Ollama provided to open community, so the goal is to make open community easier and quickly to use Ollama to get good performance which Intel been tracked and maintained. The OV backend of Ollama will be helpful to both Ollama ecosystem and OpenVINO ecosystem. We are open and delight to upstream to Ollama as one of options. Look forward anyone can help to promote the progress.

GiteaMirror commented

2026-04-28 05:11:46 -05:00

@antoniomtz commented on GitHub (Apr 10, 2025):

+6

@antoniomtz commented on GitHub (Apr 10, 2025): +6

GiteaMirror commented

2026-04-28 05:11:46 -05:00

@okc0mputex commented on GitHub (Apr 24, 2025):

Is there an update on this? This will help several projects.

@okc0mputex commented on GitHub (Apr 24, 2025): Is there an update on this? This will help several projects.

GiteaMirror commented

2026-04-28 05:11:47 -05:00

@FionaZZ92 commented on GitHub (Apr 24, 2025):

Is there an update on this? This will help several projects.

You maybe can refer this repo that OV plays backend of Ollama as a short cut.
https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino
BTW, we also working on read model from gguf file. So, once it done, I think it would be easier to seamless join with Ollama.

@FionaZZ92 commented on GitHub (Apr 24, 2025): > Is there an update on this? This will help several projects. You maybe can refer this repo that OV plays backend of Ollama as a short cut. [https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino](https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino) BTW, we also working on read model from gguf file. So, once it done, I think it would be easier to seamless join with Ollama.

GiteaMirror commented

2026-04-28 05:11:47 -05:00

@brownplayer commented on GitHub (May 1, 2025):

这方面有更新吗？这将有助于多个项目。

你也许可以参考这个 OV 播放 Ollama 后端的 repo 作为捷径。 https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino 顺便说一句，我们还在处理从 gguf 文件中读取模型。所以，一旦完成，我认为与 Ollama 无缝连接会更容易。
There was a serious problem at that time. The cpu, igpu and npu were all unusable

@brownplayer commented on GitHub (May 1, 2025): > > 这方面有更新吗？这将有助于多个项目。 > > 你也许可以参考这个 OV 播放 Ollama 后端的 repo 作为捷径。 https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino 顺便说一句，我们还在处理从 gguf 文件中读取模型。所以，一旦完成，我认为与 Ollama 无缝连接会更容易。 There was a serious problem at that time. The cpu, igpu and npu were all unusable

GiteaMirror commented

2026-04-28 05:11:48 -05:00

@emaayan commented on GitHub (Jun 12, 2025):

hi, any update on this?

@emaayan commented on GitHub (Jun 12, 2025): hi, any update on this?

GiteaMirror commented

2026-04-28 05:11:48 -05:00

@FionaZZ92 commented on GitHub (Jun 12, 2025):

hi, any update on this?

We update new model support here, you can refer: https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino

Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you.

@FionaZZ92 commented on GitHub (Jun 12, 2025): > hi, any update on this? We update new model support here, you can refer: [https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino](https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino) Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you.

GiteaMirror commented

2026-04-28 05:11:48 -05:00

@ddpasa commented on GitHub (Jun 12, 2025):

llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: https://github.com/ggml-org/llama.cpp/discussions/10879

@ddpasa commented on GitHub (Jun 12, 2025): llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: https://github.com/ggml-org/llama.cpp/discussions/10879

GiteaMirror commented

2026-04-28 05:11:49 -05:00

@emaayan commented on GitHub (Jun 12, 2025):

llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: ggml-org/llama.cpp#10879

i don't exactly know what's a vulkan backend but i have a lenovo t14g5 no gpu, just Intel Core ultra 7 165U and an NPU
i'm looking specifically for ollama so i could use jetbrains integration with it.

hi, any update on this?

We update new model support here, you can refer: openvinotoolkit/openvino_contrib@master/modules/ollama_openvino

Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you.

@emaayan commented on GitHub (Jun 12, 2025): > llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: [ggml-org/llama.cpp#10879](https://github.com/ggml-org/llama.cpp/discussions/10879) i don't exactly know what's a vulkan backend but i have a lenovo t14g5 no gpu, just Intel Core ultra 7 165U and an NPU i'm looking specifically for ollama so i could use jetbrains integration with it. > > hi, any update on this? > > We update new model support here, you can refer: [openvinotoolkit/openvino_contrib@`master`/modules/ollama_openvino](https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino) > > Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you.

GiteaMirror commented

2026-04-28 05:11:49 -05:00

@emaayan commented on GitHub (Jun 12, 2025):

hi, any update on this?

We update new model support here, you can refer: openvinotoolkit/openvino_contrib@master/modules/ollama_openvino

Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you.

hi, i actually tried using your stuff, but i have 2 issues. the first is trying to run your exe (which i feel rather uncomfortable downloading from a google drive) failed with this message (and i downloaded the environment)

and when i tried to compile it it also failed with complication errors on includes

@emaayan commented on GitHub (Jun 12, 2025): > > hi, any update on this? > > We update new model support here, you can refer: [openvinotoolkit/openvino_contrib@`master`/modules/ollama_openvino](https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino) > > Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you. hi, i actually tried using your stuff, but i have 2 issues. the first is trying to run your exe (which i feel rather uncomfortable downloading from a google drive) failed with this message (and i downloaded the environment) and when i tried to compile it it also failed with complication errors on includes ![Image](https://github.com/user-attachments/assets/4e7fdab8-7aa8-49e3-bdca-3fe732f61da5)

GiteaMirror commented

2026-04-28 05:11:50 -05:00

@ddpasa commented on GitHub (Jun 12, 2025):

llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: ggml-org/llama.cpp#10879

i don't exactly know what's a vulkan backend but i have a lenovo t14g5 no gpu, just Intel Core ultra 7 165U and an NPU i'm looking specifically for ollama so i could use jetbrains integration with it.

hi, any update on this?

We update new model support here, you can refer: openvinotoolkit/openvino_contrib@master/modules/ollama_openvino
Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you.

Intel 165U has a iGPU that supports Vulkan, so you should see a major performance gain in prompt processing and image processing from using thee Vulkan backend. My much older iris g7 gets approximately x2 speedup, you will likely get more.

@ddpasa commented on GitHub (Jun 12, 2025): > > llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: [ggml-org/llama.cpp#10879](https://github.com/ggml-org/llama.cpp/discussions/10879) > > i don't exactly know what's a vulkan backend but i have a lenovo t14g5 no gpu, just Intel Core ultra 7 165U and an NPU i'm looking specifically for ollama so i could use jetbrains integration with it. > > > > hi, any update on this? > > > > > > We update new model support here, you can refer: [openvinotoolkit/openvino_contrib@`master`/modules/ollama_openvino](https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino) > > Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you. Intel 165U has a iGPU that supports Vulkan, so you should see a major performance gain in prompt processing and image processing from using thee Vulkan backend. My much older iris g7 gets approximately x2 speedup, you will likely get more.

GiteaMirror commented

2026-04-28 05:11:50 -05:00

@emaayan commented on GitHub (Jun 12, 2025):

llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: ggml-org/llama.cpp#10879

i don't exactly know what's a vulkan backend but i have a lenovo t14g5 no gpu, just Intel Core ultra 7 165U and an NPU i'm looking specifically for ollama so i could use jetbrains integration with it.

hi, any update on this?

We update new model support here, you can refer: openvinotoolkit/openvino_contrib@master/modules/ollama_openvino
Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you.

Intel 165U has a iGPU that supports Vulkan, so you should see a major performance gain in prompt processing and image processing from using thee Vulkan backend. My much older iris g7 gets approximately x2 speedup, you will likely get more.

so this leaves the question of can it be used as Ollama server? because jetbrains integrates either with ollama or lm studio

@emaayan commented on GitHub (Jun 12, 2025): > > > llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: [ggml-org/llama.cpp#10879](https://github.com/ggml-org/llama.cpp/discussions/10879) > > > > > > i don't exactly know what's a vulkan backend but i have a lenovo t14g5 no gpu, just Intel Core ultra 7 165U and an NPU i'm looking specifically for ollama so i could use jetbrains integration with it. > > > > hi, any update on this? > > > > > > > > > We update new model support here, you can refer: [openvinotoolkit/openvino_contrib@`master`/modules/ollama_openvino](https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino) > > > Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you. > > Intel 165U has a iGPU that supports Vulkan, so you should see a major performance gain in prompt processing and image processing from using thee Vulkan backend. My much older iris g7 gets approximately x2 speedup, you will likely get more. so this leaves the question of can it be used as Ollama server? because jetbrains integrates either with ollama or lm studio

GiteaMirror commented

2026-04-28 05:11:51 -05:00

@FionaZZ92 commented on GitHub (Jun 12, 2025):

hi, any update on this?

We update new model support here, you can refer: openvinotoolkit/openvino_contrib@master/modules/ollama_openvino
Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you.

hi, i actually tried using your stuff, but i have 2 issues. the first is trying to run your exe (which i feel rather uncomfortable downloading from a google drive) failed with this message (and i downloaded the environment)

and when i tried to compile it it also failed with complication errors on includes

To successfully run, You should follow steps in above repo, download OpenVINO GenAI pkg as well. If you have any usgae question with Ollama-OV, feel free to ask there, will help to answer. https://github.com/openvinotoolkit/openvino_contrib/issues

@FionaZZ92 commented on GitHub (Jun 12, 2025): > > > hi, any update on this? > > > > > > We update new model support here, you can refer: [openvinotoolkit/openvino_contrib@`master`/modules/ollama_openvino](https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino) > > Due to gguf model size is not the smallest, it contains much Q6 layers. In execution level, can only fallback to Q8/FP16. So, the performance is not the best. If you pursue perf boost on Intel platform. I suggest you follow above link. Thank you. > > hi, i actually tried using your stuff, but i have 2 issues. the first is trying to run your exe (which i feel rather uncomfortable downloading from a google drive) failed with this message (and i downloaded the environment) > > and when i tried to compile it it also failed with complication errors on includes > > ![Image](https://github.com/user-attachments/assets/4e7fdab8-7aa8-49e3-bdca-3fe732f61da5) To successfully run, You should follow steps in above repo, download OpenVINO GenAI pkg as well. If you have any usgae question with Ollama-OV, feel free to ask there, will help to answer. [https://github.com/openvinotoolkit/openvino_contrib/issues](https://github.com/openvinotoolkit/openvino_contrib/issues)

GiteaMirror commented

2026-04-28 05:11:51 -05:00

@ghchris2021 commented on GitHub (Jun 21, 2025):

llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: ggml-org/llama.cpp#10879

I am recently starting to revisit what inference & UI / local OpenAI API server options work for me on intel gpu.

A week or so ago I tried ggml.org's own llama.cpp releases with their these different builds / back ends: sycl, vulkan and as far as I recall the llama.cpp sycl option often worked with more performance benchmarked on the same GGUF model files as compared to vulkan performing less quickly in noteworthy cases.
The sycl version was a few weeks older than the vulkan version at that time but there are now newer ones from yesterday for both which I have not tried.

I also tried the most recently available ipex-llm inference SW from intel which IIRC uses a heavily modified llama.cpp (or maybe I'm wrong) and it somehow uses the intel IPEX as a part of the inference process. I found at the time I tested this one to be very significantly faster than either the ggml.org llama.cpp sycl or vulkan builds, almost 2x faster in one relevant case that I somewhat recall.

I did not then or since get time to compare benchmarking ollama, huggingface transformers, or onnx for inference of the same general models but those are also possible inference options which might have particular advantages and disadvantages for some use cases.

@ghchris2021 commented on GitHub (Jun 21, 2025): > llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: [ggml-org/llama.cpp#10879](https://github.com/ggml-org/llama.cpp/discussions/10879) I am recently starting to revisit what inference & UI / local OpenAI API server options work for me on intel gpu. A week or so ago I tried ggml.org's own llama.cpp releases with their these different builds / back ends: sycl, vulkan and as far as I recall the llama.cpp sycl option often worked with more performance benchmarked on the same GGUF model files as compared to vulkan performing less quickly in noteworthy cases. The sycl version was a few weeks older than the vulkan version at that time but there are now newer ones from yesterday for both which I have not tried. I also tried the most recently available ipex-llm inference SW from intel which IIRC uses a heavily modified llama.cpp (or maybe I'm wrong) and it somehow uses the intel IPEX as a part of the inference process. I found at the time I tested this one to be very significantly faster than either the ggml.org llama.cpp sycl or vulkan builds, almost 2x faster in one relevant case that I somewhat recall. I did not then or since get time to compare benchmarking ollama, huggingface transformers, or onnx for inference of the same general models but those are also possible inference options which might have particular advantages and disadvantages for some use cases.

GiteaMirror commented

2026-04-28 05:11:52 -05:00

@emaayan commented on GitHub (Jun 22, 2025):

llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: ggml-org/llama.cpp#10879

I am recently starting to revisit what inference & UI / local OpenAI API server options work for me on intel gpu.

A week or so ago I tried ggml.org's own llama.cpp releases with their these different builds / back ends: sycl, vulkan and as far as I recall the llama.cpp sycl option often worked with more performance benchmarked on the same GGUF model files as compared to vulkan performing less quickly in noteworthy cases. The sycl version was a few weeks older than the vulkan version at that time but there are now newer ones from yesterday for both which I have not tried.

I also tried the most recently available ipex-llm inference SW from intel which IIRC uses a heavily modified llama.cpp (or maybe I'm wrong) and it somehow uses the intel IPEX as a part of the inference process. I found at the time I tested this one to be very significantly faster than either the ggml.org llama.cpp sycl or vulkan builds, almost 2x faster in one relevant case that I somewhat recall.

I did not then or since get time to compare benchmarking ollama, huggingface transformers, or onnx for inference of the same general models but those are also possible inference options which might have particular advantages and disadvantages for some use cases.

are you talking about this thing? https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

@emaayan commented on GitHub (Jun 22, 2025): > > llama.cpp with Vulkan backend works very well with intel GPUs, you can find benchmarks here: [ggml-org/llama.cpp#10879](https://github.com/ggml-org/llama.cpp/discussions/10879) > > I am recently starting to revisit what inference & UI / local OpenAI API server options work for me on intel gpu. > > A week or so ago I tried ggml.org's own llama.cpp releases with their these different builds / back ends: sycl, vulkan and as far as I recall the llama.cpp sycl option often worked with more performance benchmarked on the same GGUF model files as compared to vulkan performing less quickly in noteworthy cases. The sycl version was a few weeks older than the vulkan version at that time but there are now newer ones from yesterday for both which I have not tried. > > I also tried the most recently available ipex-llm inference SW from intel which IIRC uses a heavily modified llama.cpp (or maybe I'm wrong) and it somehow uses the intel IPEX as a part of the inference process. I found at the time I tested this one to be very significantly faster than either the ggml.org llama.cpp sycl or vulkan builds, almost 2x faster in one relevant case that I somewhat recall. > > I did not then or since get time to compare benchmarking ollama, huggingface transformers, or onnx for inference of the same general models but those are also possible inference options which might have particular advantages and disadvantages for some use cases. are you talking about this thing? https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

GiteaMirror commented

2026-04-28 05:11:54 -05:00

@ghchris2021 commented on GitHub (Jun 22, 2025):

are you talking about this thing? https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md

Yes, sort of. I didn't specifically use the ollama tie-in, just the ipex-llm basic release which has llama.cpp like tool binaries for benchmarking, serving, etc. So I ran some benchmarks to compare it to ggml.org's llama.cpp's benchmark in their sycl and vulkan builds.

https://github.com/intel/ipex-llm/releases

https://github.com/ipex-llm/ipex-llm/releases/tag/v2.3.0-nightly

The ipex-llm "nightly build" doesn't actually seem to be recently made and published by those links but maybe there are newer builds somewhere otherwise I suspect there will eventually be new versions. Anyway it was one more thing to test.

Compared to the ggml.org llama.cpp release builds IIRC I found ipex-llm to be interestingly faster in some interesting benchmark use case I tried. I suspect that would also translate to actual "server" and "ollama" performance when it's based on the same underlying inference code version.

For points of comparison there's the 'full-intel' and 'full-vulkan' ggml.org llama.cpp builds / versions such as these which are pretty recently made as of the past day anyway:

https://github.com/ggml-org/llama.cpp/pkgs/container/llama.cpp/versions

https://github.com/ggml-org/llama.cpp/pkgs/container/llama.cpp/443832581?tag=full-intel-b5732

https://github.com/ggml-org/llama.cpp/pkgs/container/llama.cpp/443831174?tag=full-vulkan-b5732

And when I get around to it I'll compare with ollama, huggingface transformers in various configurations and the newest openvino.

@ghchris2021 commented on GitHub (Jun 22, 2025): > are you talking about this thing? https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_quickstart.md Yes, sort of. I didn't specifically use the ollama tie-in, just the ipex-llm basic release which has llama.cpp like tool binaries for benchmarking, serving, etc. So I ran some benchmarks to compare it to ggml.org's llama.cpp's benchmark in their sycl and vulkan builds. https://github.com/intel/ipex-llm/releases https://github.com/ipex-llm/ipex-llm/releases/tag/v2.3.0-nightly The ipex-llm "nightly build" doesn't actually seem to be recently made and published by those links but maybe there are newer builds somewhere otherwise I suspect there will eventually be new versions. Anyway it was one more thing to test. Compared to the ggml.org llama.cpp release builds IIRC I found ipex-llm to be interestingly faster in some interesting benchmark use case I tried. I suspect that would also translate to actual "server" and "ollama" performance when it's based on the same underlying inference code version. For points of comparison there's the 'full-intel' and 'full-vulkan' ggml.org llama.cpp builds / versions such as these which are pretty recently made as of the past day anyway: https://github.com/ggml-org/llama.cpp/pkgs/container/llama.cpp/versions https://github.com/ggml-org/llama.cpp/pkgs/container/llama.cpp/443832581?tag=full-intel-b5732 https://github.com/ggml-org/llama.cpp/pkgs/container/llama.cpp/443831174?tag=full-vulkan-b5732 And when I get around to it I'll compare with ollama, huggingface transformers in various configurations and the newest openvino.

GiteaMirror commented

2026-04-28 05:11:58 -05:00

@emaayan commented on GitHub (Jun 22, 2025):

i tried the ollama 2.3.0 with qwen code 2.5 instruct and it on my intel ultra 165U and it didn't seem to be working any faster with i aksed it analyze a java class.

@emaayan commented on GitHub (Jun 22, 2025): i tried the ollama 2.3.0 with qwen code 2.5 instruct and it on my intel ultra 165U and it didn't seem to be working any faster with i aksed it analyze a java class.

GiteaMirror commented

2026-04-28 05:12:01 -05:00

@kamikaze commented on GitHub (Jan 14, 2026):

Looks like an already abandoned project

@kamikaze commented on GitHub (Jan 14, 2026): Looks like an already abandoned project

GiteaMirror commented

2026-04-28 05:12:05 -05:00

@ddpasa commented on GitHub (Jan 14, 2026):

Looks like an already abandoned project

just use llama.cpp, they already support intel GPUs.

@ddpasa commented on GitHub (Jan 14, 2026): > Looks like an already abandoned project just use llama.cpp, they already support intel GPUs.

GiteaMirror commented

2026-04-28 05:12:12 -05:00

@kamikaze commented on GitHub (Jan 14, 2026):

Looks like an already abandoned project

just use llama.cpp, they already support intel GPUs.

but not NPU

@kamikaze commented on GitHub (Jan 14, 2026): > > Looks like an already abandoned project > > just use llama.cpp, they already support intel GPUs. but not NPU

GiteaMirror commented

2026-04-28 05:12:16 -05:00

@ddpasa commented on GitHub (Jan 15, 2026):

Looks like an already abandoned project

just use llama.cpp, they already support intel GPUs.

but not NPU

Open a ticket with the llama.cpp folks. They are great at this kind of stuff.

@ddpasa commented on GitHub (Jan 15, 2026): > > > Looks like an already abandoned project > > > > > > just use llama.cpp, they already support intel GPUs. > > but not NPU Open a ticket with the llama.cpp folks. They are great at this kind of stuff.

GiteaMirror commented

2026-04-28 05:12:17 -05:00

@rklec commented on GitHub (Jan 27, 2026):

See https://github.com/ggml-org/llama.cpp/issues/5079 and https://github.com/ggml-org/llama.cpp/issues/9181 it's a reoccurring issue apparently.

Apparently here is also a repo by the Intel people: https://github.com/intel/ipex-llm

@rklec commented on GitHub (Jan 27, 2026): See https://github.com/ggml-org/llama.cpp/issues/5079 and https://github.com/ggml-org/llama.cpp/issues/9181 it's a reoccurring issue apparently. Apparently here is also a repo by the Intel people: https://github.com/intel/ipex-llm

GiteaMirror commented

2026-04-28 05:12:17 -05:00

@jclab-joseph commented on GitHub (Mar 30, 2026):

I hope OpenVino is integrated into Ollama.
Maybe can refer to llama.cpp: https://github.com/ggml-org/llama.cpp/pull/15307

@jclab-joseph commented on GitHub (Mar 30, 2026): I hope OpenVino is integrated into Ollama. Maybe can refer to llama.cpp: https://github.com/ggml-org/llama.cpp/pull/15307

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#47753