[GH-ISSUE #3113] Integrated Intel GPU support #27673

New Issue

GiteaMirror · 2026-04-22T05:11:54-05:00

GiteaMirror commented

2026-04-22 05:11:54 -05:00

Originally created by @clvgt12 on GitHub (Mar 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3113

Originally assigned to: @dhiltgen on GitHub.

Hello,

Please consider adapting Ollama to use Intel Integrated Graphics Processors (such as the Intel Iris Xe Graphics cores) in the future.

Originally created by @clvgt12 on GitHub (Mar 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3113 Originally assigned to: @dhiltgen on GitHub. Hello, Please consider adapting Ollama to use Intel Integrated Graphics Processors (such as the Intel Iris Xe Graphics cores) in the future.

GiteaMirror added the intel feature request labels 2026-04-22 05:11:54 -05:00

GiteaMirror commented

2026-04-22 05:11:55 -05:00

@ddpasa commented on GitHub (Mar 13, 2024):

take a look at: https://github.com/ollama/ollama/pull/2578

@ddpasa commented on GitHub (Mar 13, 2024): take a look at: https://github.com/ollama/ollama/pull/2578

GiteaMirror commented

2026-04-22 05:11:56 -05:00

@clvgt12 commented on GitHub (Mar 13, 2024):

Very nice! looking forward to testing it on my Windows PC running Ollama in the future!

@clvgt12 commented on GitHub (Mar 13, 2024): Very nice! looking forward to testing it on my Windows PC running Ollama in the future!

GiteaMirror commented

2026-04-22 05:11:56 -05:00

@vinaykharayat commented on GitHub (Mar 27, 2024):

+1

@vinaykharayat commented on GitHub (Mar 27, 2024): +1

GiteaMirror commented

2026-04-22 05:11:56 -05:00

@MarkWard0110 commented on GitHub (Apr 19, 2024):

I would imagine for anyone who has an Intel integrated GPU, the otherwise unused GPU would add an additional GPU to utilize.
Even if it was limited to 3GB. That would be an additional 3GB GPU that could be utilized. It is a 3GB GPU that is not utilized when a model is split between an Nvidia GPU and CPU.
I am running a headless server and the integrated GPU is there and not doing anything to help.

@MarkWard0110 commented on GitHub (Apr 19, 2024): I would imagine for anyone who has an Intel integrated GPU, the otherwise unused GPU would add an additional GPU to utilize. Even if it was limited to 3GB. That would be an additional 3GB GPU that could be utilized. It is a 3GB GPU that is not utilized when a model is split between an Nvidia GPU and CPU. I am running a headless server and the integrated GPU is there and not doing anything to help.

GiteaMirror commented

2026-04-22 05:11:57 -05:00

@carlos-burelo commented on GitHub (Apr 27, 2024):

+1

@carlos-burelo commented on GitHub (Apr 27, 2024): +1

GiteaMirror commented

2026-04-22 05:11:58 -05:00

@sspanogle commented on GitHub (Jun 3, 2024):

+1 xo

@sspanogle commented on GitHub (Jun 3, 2024): +1 xo

GiteaMirror commented

2026-04-22 05:11:58 -05:00

@alexb7373 commented on GitHub (Jun 5, 2024):

+1

@alexb7373 commented on GitHub (Jun 5, 2024): +1

GiteaMirror commented

2026-04-22 05:11:59 -05:00

@suncloudsmoon commented on GitHub (Jun 7, 2024):

+1

@suncloudsmoon commented on GitHub (Jun 7, 2024): +1

GiteaMirror commented

2026-04-22 05:12:00 -05:00

@serhatsatir commented on GitHub (Jun 10, 2024):

I also have Intel Iris Xe with 8GB ram. But I can't see any benefit. It would be very useful if the hardware we have could be used to its full capacity.
Perhaps consulting AI on how to do it could be a solution. 😂

@serhatsatir commented on GitHub (Jun 10, 2024): I also have Intel Iris Xe with 8GB ram. But I can't see any benefit. It would be very useful if the hardware we have could be used to its full capacity. Perhaps consulting AI on how to do it could be a solution. 😂

GiteaMirror commented

2026-04-22 05:12:01 -05:00

@carlos-burelo commented on GitHub (Aug 1, 2024):

Personally, I have tried using WebLLM to run AI models like Llama3. When I do this, I notice an improvement in token generation speed because the Intel graphics card is being utilized via the WebGPU API. I wouldn't say the improvement is radical, but it is slightly faster. With some caution, I would estimate the improvement to be around 15%. My specifications are:

Chipset: Intel Core i7 1165G7
Graphics: Intel Iris Xe Graphics 15.7 GB shared memory
RAM: 32 GB DDR4 2667MT/s

Therefore, results may vary significantly depending on the specifications of each system.

@carlos-burelo commented on GitHub (Aug 1, 2024): Personally, I have tried using [WebLLM](https://github.com/mlc-ai/web-llm) to run AI models like Llama3. When I do this, I notice an improvement in token generation speed because the Intel graphics card is being utilized via the WebGPU API. I wouldn't say the improvement is radical, but it is slightly faster. With some caution, I would estimate the improvement to be around 15%. My specifications are: Chipset: Intel Core i7 1165G7 Graphics: Intel Iris Xe Graphics 15.7 GB shared memory RAM: 32 GB DDR4 2667MT/s Therefore, results may vary significantly depending on the specifications of each system.

GiteaMirror commented

2026-04-22 05:12:01 -05:00

@jomardyan commented on GitHub (Aug 10, 2024):

+1

@jomardyan commented on GitHub (Aug 10, 2024): +1

GiteaMirror commented

2026-04-22 05:12:02 -05:00

@ayttop commented on GitHub (Aug 24, 2024):

ollama with igpu intel

ollama run with igpu intel with

https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md

@ayttop commented on GitHub (Aug 24, 2024): ollama with igpu intel ollama run with igpu intel with https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md

GiteaMirror commented

2026-04-22 05:12:02 -05:00

@user7z commented on GitHub (Sep 29, 2024):

ollama with igpu intel

ollama run with igpu intel with

https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md

integrated gpu support isnt that well with current versions, see this issue

@user7z commented on GitHub (Sep 29, 2024): > ollama with igpu intel > > ollama run with igpu intel with > > https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md integrated gpu support isnt that well with current versions, see this [issue](https://github.com/intel-analytics/ipex-llm/issues/12120#issuecomment-2379372351)

GiteaMirror commented

2026-04-22 05:12:04 -05:00

@havardthom commented on GitHub (Oct 28, 2024):

For anyone interested, I've added an Ollama LXC script to tteck's Proxmox Helper-Scripts. The script installs intel-basekit and builds Ollama from source and ~~supports Intel iGPU passthrough~~ (though it has a very long install time). It can be run like any other proxmox helper script: bash -c "$(wget -qLO - https://github.com/tteck/Proxmox/raw/main/ct/ollama.sh)"

A script for Open WebUI LXC with optional Ollama install is also available: https://tteck.github.io/Proxmox/#open-webui-lxc

Edit: NVM Ollama does not support iGPU because of vram reporting issues, need to wait for this https://github.com/ollama/ollama/pull/5593

@havardthom commented on GitHub (Oct 28, 2024): For anyone interested, I've added an Ollama LXC script to tteck's Proxmox Helper-Scripts. The script installs intel-basekit and builds Ollama from source and ~~supports Intel iGPU passthrough~~ (though it has a very long install time). It can be run like any other proxmox helper script: `bash -c "$(wget -qLO - https://github.com/tteck/Proxmox/raw/main/ct/ollama.sh)"` A script for Open WebUI LXC with optional Ollama install is also available: https://tteck.github.io/Proxmox/#open-webui-lxc **Edit: NVM Ollama does not support iGPU because of vram reporting issues, need to wait for this** https://github.com/ollama/ollama/pull/5593

GiteaMirror commented

2026-04-22 05:12:05 -05:00

@maczet commented on GitHub (Nov 30, 2024):

+1

@maczet commented on GitHub (Nov 30, 2024): +1

GiteaMirror commented

2026-04-22 05:12:07 -05:00

@SystemStrategy commented on GitHub (Dec 2, 2024):

I ran the tteck ollama and would not load some models and was way slower compared to the docker version of Ipex

Intel Docker IPEX:
https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/docker_cpp_xpu_quickstart.md

Added commands to auto-start:
https://github.com/SystemStrategy/Proxmox/blob/main/Ipex_Compose

@SystemStrategy commented on GitHub (Dec 2, 2024): I ran the tteck ollama and would not load some models and was way slower compared to the docker version of Ipex Intel Docker IPEX: https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/docker_cpp_xpu_quickstart.md Added commands to auto-start: https://github.com/SystemStrategy/Proxmox/blob/main/Ipex_Compose

GiteaMirror commented

2026-04-22 05:12:09 -05:00

@gaborkukucska commented on GitHub (Dec 7, 2024):

+1

@gaborkukucska commented on GitHub (Dec 7, 2024): +1

GiteaMirror commented

2026-04-22 05:12:12 -05:00

@ChrisBGL commented on GitHub (Dec 8, 2024):

Please add native support for intel iGPU

@ChrisBGL commented on GitHub (Dec 8, 2024): Please add native support for intel iGPU

GiteaMirror commented

2026-04-22 05:12:14 -05:00

@user7z commented on GitHub (Dec 8, 2024):

I think its not ollama devs probleme , itd an intel probleme that they cant make their oneAPI be usable by the community , & the obscure way ipex-llm is bieng developped is just insane & wouldnt make it possible for ollama devs to integrate it , its intel's probleme.

@user7z commented on GitHub (Dec 8, 2024): I think its not ollama devs probleme , itd an intel probleme that they cant make their oneAPI be usable by the community , & the obscure way ipex-llm is bieng developped is just insane & wouldnt make it possible for ollama devs to integrate it , its intel's probleme.

GiteaMirror commented

2026-04-22 05:12:15 -05:00

@MaoJianwei commented on GitHub (Jul 15, 2025):

Can ollama use Intel integrated GPU to speed up inference？e.g. Intel UHD Graphics 630 of i5-10400

@MaoJianwei commented on GitHub (Jul 15, 2025): Can ollama use Intel integrated GPU to speed up inference？e.g. Intel UHD Graphics 630 of i5-10400

GiteaMirror commented

2026-04-22 05:12:16 -05:00

@ddpasa commented on GitHub (Jul 15, 2025):

Can ollama use Intel integrated GPU to speed up inference？e.g. Intel UHD Graphics 630 of i5-10400

I'm using an intel iGPU with llama.cpp with Vulkan. Ollama is just a wrapper for llama.cpp, but for reasons unknown the devs refuse to enable Vulkan. If you have an Intel iGPU, my recommendation is to use llama.cpp directly.

@ddpasa commented on GitHub (Jul 15, 2025): > Can ollama use Intel integrated GPU to speed up inference？e.g. Intel UHD Graphics 630 of i5-10400 I'm using an intel iGPU with llama.cpp with Vulkan. Ollama is just a wrapper for llama.cpp, but for reasons unknown the devs refuse to enable Vulkan. If you have an Intel iGPU, my recommendation is to use llama.cpp directly.

GiteaMirror commented

2026-04-22 05:12:17 -05:00

@NeoZhangJianyu commented on GitHub (Jul 16, 2025):

This issue has a solution by https://github.com/ollama/ollama/issues/8414.

@NeoZhangJianyu commented on GitHub (Jul 16, 2025): This issue has a solution by https://github.com/ollama/ollama/issues/8414.

GiteaMirror commented

2026-04-22 05:12:18 -05:00

@MaoJianwei commented on GitHub (Jul 21, 2025):

Can ollama use Intel integrated GPU to speed up inference？e.g. Intel UHD Graphics 630 of i5-10400

I'm using an intel iGPU with llama.cpp with Vulkan. Ollama is just a wrapper for llama.cpp, but for reasons unknown the devs refuse to enable Vulkan. If you have an Intel iGPU, my recommendation is to use llama.cpp directly.

@ddpasa Many thanks, I have tried to run Ollama with Irix Xe(iGPU) of Intel i7, but I found the inference speed of iGPU is near to that of CPU.
(about 18 tokens/s against 19 tokens/s)

So I think it makes no sence to attemp to run Ollama with iGPU.

@MaoJianwei commented on GitHub (Jul 21, 2025): > > Can ollama use Intel integrated GPU to speed up inference？e.g. Intel UHD Graphics 630 of i5-10400 > > I'm using an intel iGPU with llama.cpp with Vulkan. Ollama is just a wrapper for llama.cpp, but for reasons unknown the devs refuse to enable Vulkan. If you have an Intel iGPU, my recommendation is to use llama.cpp directly. @ddpasa Many thanks, I have tried to run Ollama with Irix Xe(iGPU) of Intel i7, but I found the inference speed of iGPU is near to that of CPU. **(about 18 tokens/s against 19 tokens/s)** So I think it makes no sence to attemp to run Ollama with iGPU.

GiteaMirror commented

2026-04-22 05:12:19 -05:00

@MaoJianwei commented on GitHub (Jul 21, 2025):

This issue has a solution by #8414.

No, #8414 doesn't support 10th Intel CPU @NeoZhangJianyu

@MaoJianwei commented on GitHub (Jul 21, 2025): > This issue has a solution by [#8414](https://github.com/ollama/ollama/issues/8414). No, #8414 doesn't support 10th Intel CPU @NeoZhangJianyu

GiteaMirror commented

2026-04-22 05:12:20 -05:00

@Gunnarr970 commented on GitHub (Jul 21, 2025):

So I think it makes no sence to attemp to run Ollama with iGPU.

Even if the speed is the same, CPU resources will continue to be available for other processes if the GPU is used

@Gunnarr970 commented on GitHub (Jul 21, 2025): > So I think it makes no sence to attemp to run Ollama with iGPU. Even if the speed is the same, CPU resources will continue to be available for other processes if the GPU is used

GiteaMirror commented

2026-04-22 05:12:21 -05:00

@NeoZhangJianyu commented on GitHub (Jul 22, 2025):

This issue has a solution by #8414.

No, #8414 doesn't support 10th Intel CPU @NeoZhangJianyu

The iGPU in 10th CPU don't supported by oneAPI (SYCL).
It's the root cause that llama.cpp SYCL backend can't support it.

Refer to hardware.

@NeoZhangJianyu commented on GitHub (Jul 22, 2025): > > This issue has a solution by [#8414](https://github.com/ollama/ollama/issues/8414). > > No, [#8414](https://github.com/ollama/ollama/issues/8414) doesn't support 10th Intel CPU [@NeoZhangJianyu](https://github.com/NeoZhangJianyu) The iGPU in 10th CPU don't supported by oneAPI (SYCL). It's the root cause that llama.cpp SYCL backend can't support it. Refer to [hardware](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/SYCL.md#hardware).

GiteaMirror commented

2026-04-22 05:12:22 -05:00

@MaoJianwei commented on GitHub (Jul 22, 2025):

So I think it makes no sence to attemp to run Ollama with iGPU.

Even if the speed is the same, CPU resources will continue to be available for other processes if the GPU is used

Yes, but my purpose of using iGPU is to speed up inference, this expection is not satisfied.

@MaoJianwei commented on GitHub (Jul 22, 2025): > > So I think it makes no sence to attemp to run Ollama with iGPU. > > Even if the speed is the same, CPU resources will continue to be available for other processes if the GPU is used Yes, but my purpose of using iGPU is to speed up inference, this expection is not satisfied.

GiteaMirror commented

2026-04-22 05:12:24 -05:00

@MaoJianwei commented on GitHub (Jul 22, 2025):

The iGPU in 10th CPU don't supported by oneAPI (SYCL). It's the root cause that llama.cpp SYCL backend can't support it.

Refer to hardware.

Thanks, I see. I bought my computer one year earlier. What a poty.

@MaoJianwei commented on GitHub (Jul 22, 2025): > > The iGPU in 10th CPU don't supported by oneAPI (SYCL). It's the root cause that llama.cpp SYCL backend can't support it. > > Refer to [hardware](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/SYCL.md#hardware). Thanks, I see. I bought my computer one year earlier. What a poty.

GiteaMirror commented

2026-04-22 05:12:26 -05:00

@ddpasa commented on GitHub (Jul 22, 2025):

So I think it makes no sence to attemp to run Ollama with iGPU.

Even if the speed is the same, CPU resources will continue to be available for other processes if the GPU is used

Yes, but my purpose of using iGPU is to speed up inference, this expection is not satisfied.

Token generation is limited by memory bandwidth, so you'll see very similar speeds for CPU or iGPU. Th iGPU helps with input token processing or image processing. I'm getting x2 to x3 speedups on input token processing or VLMs on an Intel 10th gen iGPU when using llama.cpp with Vulkan.

@ddpasa commented on GitHub (Jul 22, 2025): > > > So I think it makes no sence to attemp to run Ollama with iGPU. > > > > > > Even if the speed is the same, CPU resources will continue to be available for other processes if the GPU is used > > Yes, but my purpose of using iGPU is to speed up inference, this expection is not satisfied. Token generation is limited by memory bandwidth, so you'll see very similar speeds for CPU or iGPU. Th iGPU helps with input token processing or image processing. I'm getting x2 to x3 speedups on input token processing or VLMs on an Intel 10th gen iGPU when using llama.cpp with Vulkan.

GiteaMirror commented

2026-04-22 05:12:27 -05:00

@MaoJianwei commented on GitHub (Jul 22, 2025):

So I think it makes no sence to attemp to run Ollama with iGPU.

Even if the speed is the same, CPU resources will continue to be available for other processes if the GPU is used

Yes, but my purpose of using iGPU is to speed up inference, this expection is not satisfied.

Token generation is limited by memory bandwidth, so you'll see very similar speeds for CPU or iGPU. Th iGPU helps with input token processing or image processing. I'm getting x2 to x3 speedups on input token processing or VLMs on an Intel 10th gen iGPU when using llama.cpp with Vulkan.

Do you mean iGPU can speed up Prefill phase but not Decode phase? @ddpasa

@MaoJianwei commented on GitHub (Jul 22, 2025): > > > > So I think it makes no sence to attemp to run Ollama with iGPU. > > > > > > > > > Even if the speed is the same, CPU resources will continue to be available for other processes if the GPU is used > > > > > > Yes, but my purpose of using iGPU is to speed up inference, this expection is not satisfied. > > Token generation is limited by memory bandwidth, so you'll see very similar speeds for CPU or iGPU. Th iGPU helps with input token processing or image processing. I'm getting x2 to x3 speedups on input token processing or VLMs on an Intel 10th gen iGPU when using llama.cpp with Vulkan. Do you mean iGPU can speed up Prefill phase but not Decode phase? @ddpasa

GiteaMirror commented

2026-04-22 05:12:28 -05:00

@MaoJianwei commented on GitHub (Aug 10, 2025):

I found the solution! That's crazy! @NeoZhangJianyu

https://github.com/ggml-org/llama.cpp/issues/1956

@MaoJianwei commented on GitHub (Aug 10, 2025): I found the solution! That's crazy! @NeoZhangJianyu https://github.com/ggml-org/llama.cpp/issues/1956

GiteaMirror commented

2026-04-22 05:12:29 -05:00

@vsenn commented on GitHub (Feb 6, 2026):

+1. It would be great to have iGPU support for Intel hardware, as I have stuff like Intel UHD Graphics @ 1.10 GHz [Integrated], Intel Comet Lake UHD Graphics @ 1.15 GHz [Integrated] with 65 GB RAM\VRAM in my Intel NUC machines running Kubernetes cluster.

@vsenn commented on GitHub (Feb 6, 2026): +1. It would be great to have iGPU support for Intel hardware, as I have stuff like Intel UHD Graphics @ 1.10 GHz [Integrated], Intel Comet Lake UHD Graphics @ 1.15 GHz [Integrated] with 65 GB RAM\VRAM in my Intel NUC machines running Kubernetes cluster.

GiteaMirror commented

2026-04-22 05:12:30 -05:00

@MaoJianwei commented on GitHub (Feb 6, 2026):

I found the solution! https://github.com/ggml-org/llama.cpp/issues/1956#issuecomment-3172333543
@vsenn

@MaoJianwei commented on GitHub (Feb 6, 2026): I found the solution! https://github.com/ggml-org/llama.cpp/issues/1956#issuecomment-3172333543 @vsenn

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#27673