[PR #11835] [MERGED] Vulkan based on #9650 #18902

Closed
opened 2026-04-16 06:51:12 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11835
Author: @inforithmics
Created: 8/9/2025
Status: Merged
Merged: 10/14/2025
Merged by: @dhiltgen

Base: mainHead: vulkanV3


📝 Commits (10+)

📊 Changes

152 files changed (+29425 additions, -15 deletions)

View changed files

📝 .github/workflows/test.yaml (+34 -1)
📝 CMakeLists.txt (+12 -0)
📝 CMakePresets.json (+9 -0)
📝 Dockerfile (+22 -2)
📝 discover/gpu.go (+30 -1)
📝 discover/runner.go (+53 -4)
📝 discover/types.go (+3 -2)
📝 envconfig/config.go (+2 -0)
📝 llama/llama.go (+3 -1)
llama/patches/0027-vulkan-get-GPU-ID-ollama-v0.11.5.patch (+95 -0)
llama/patches/0028-vulkan-pci-and-memory.patch (+253 -0)
📝 llm/server.go (+1 -0)
📝 ml/backend/ggml/ggml.go (+7 -3)
📝 ml/backend/ggml/ggml/.rsync-filter (+4 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/CMakeLists.txt (+211 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/ggml-vulkan.cpp (+13903 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt (+31 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/acc.comp (+29 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/add.comp (+69 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/add_id.comp (+42 -0)

...and 80 more files

📄 Description

A pull request of https://github.com/ollama/ollama/pull/9650 with newest patches of main

  • Automatic Merges and with AI probably won't work - is a draft to see what fails and what needs to be done to make it work.
  • Needs some additional commits to revert unintended code changes.
  • Needs Testung
    • Test Windows
      • AMD
      • Intel
      • Nvidia
    • Test Linux
    • Test Docker
  • A quick look at the merged showed that some changes like in ggml.go need to be reverted. (Probably bad merges).
  • Update Vulkan llama.cpp code.
  • Update Vulkan when https://github.com/ollama/ollama/pull/11823 went in.
  • Compiles on Windows
  • Enable Flash Attention again
  • It seems that it does not use Vulkan (It detects it but the execution seems to be still on cpu) (was missing lib folder)
  • Learn how to build an release version of ollama (setup, zip)
  • Intel Vulkan KV Cache Quantization does not work.
  • gpt-oss does not work yet needs llama.cpp update
  • fix Unit Tests
  • Seems the Memory estimation is to Optimistic (Is fixed with the new estimation logic)
  • Don't use mmap in Vulkan per default if nothing is specified.
  • Enable Flash Attention when support (Not needed Flash attenation should be supported on all devices now.
  • Update to newest master, my last merge was faulty and caused the vulkan backend not to work. Reset to 834a66689e which works.
    Known issues:
  • Update vulkan when llama.cpp is synchronized to at least https://github.com/ggml-org/llama.cpp/pull/15334
    Without this OLLAMA_KV_CACHE_TYPE=f16 has to be set or else the llama runner crashes.
  • Fix gpu id on Windows 06b8c3c394
  • What happens when two backends are supported rocm and vulkan or nvidia and vulkan on the same gpu.
    (On llama.cpp sometimes vulkan and rocm are in the same Performance range for Nvidia I cannot say).
    Filter out Vulkan Devices where the Same Id already exists in ROCM or CUDA.
    https://github.com/ggml-org/llama.cpp/pull/15947
  • On AMD 7900 XTX Qwen3 shows as 100% on GPU but is really only partially calculatoed on GPU (Maybe vulkan update helps).
  • Patch validation fails (make -f Makefile.sync clean checkout apply-patches sync) run under WSL.
  • Make CI green
  • Add Vulkan Build to Test Build Matrix
    The vulkan Builds ran sucessfully here https://github.com/inforithmics/ollama/pull/7
  • Add Vulkan Build to Release Build Matrix
  • Only select supported vulkan devices https://github.com/ggml-org/llama.cpp/pull/15976/files
  • Review patches from here https://github.com/whyvl/ollama-vulkan/issues/7#issuecomment-2660836871
    For example GGML_VK_VISIBLE_DEVICES
  • Update from Main https://github.com/inforithmics/ollama/pull/9 (Vulkan detection seems not working yet)
  • Vulkan coopmat1 Flash Attention Fix https://github.com/ggml-org/llama.cpp/pull/16365
  • Gemma3:12B fails on my IGPU with Fixed by newest llama.cpp update https://github.com/ollama/ollama/pull/12552

Version 12.5

OllamaSetup.zip

Build with build_windows.ps1:

Some interesting Links:
Vulkan vs ROCm on Linux:
https://www.phoronix.com/review/llama-cpp-windows-linux/5
https://www.phoronix.com/review/amd-rocm-7-strix-halo/3


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11835 **Author:** [@inforithmics](https://github.com/inforithmics) **Created:** 8/9/2025 **Status:** ✅ Merged **Merged:** 10/14/2025 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `vulkanV3` --- ### 📝 Commits (10+) - [`f46b4a6`](https://github.com/ollama/ollama/commit/f46b4a6fa263d7cf51bc8f3ceb2a69d2c1e83fdd) implement the vulkan C backend - [`9c6b049`](https://github.com/ollama/ollama/commit/9c6b0495678f66f5b6b50fdb05c7efd99f5a208f) add support in gpu.go - [`93c4d69`](https://github.com/ollama/ollama/commit/93c4d69daa02be2c4407c73d30c8fe72961de61b) add support in gen_linux.sh - [`24c8840`](https://github.com/ollama/ollama/commit/24c8840037a9edd48fafd31f113916cb4105c922) it builds - [`724fac4`](https://github.com/ollama/ollama/commit/724fac470f0df86e8d0d24e209bea34f31a4ec84) fix segfault - [`e4e8a5d`](https://github.com/ollama/ollama/commit/e4e8a5d25a375c9df03ad122211237798e4ca743) fix compilation - [`257364c`](https://github.com/ollama/ollama/commit/257364cb3c47a5e392bfb1772ecf6709dc0a7c83) fix free memory monitor - [`11c55fa`](https://github.com/ollama/ollama/commit/11c55fab8113a02fbd77968c99856c22fb89c880) fix total memory monitor - [`e77ea68`](https://github.com/ollama/ollama/commit/e77ea68e114022df303ead281915efd86ed31006) Merge branch 'refs/heads/main' into vulkan - [`18f3f96`](https://github.com/ollama/ollama/commit/18f3f960b01e1dd18a43fbcddbc0dc9de1ae2cbd) update gpu.go ### 📊 Changes **152 files changed** (+29425 additions, -15 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/test.yaml` (+34 -1) 📝 `CMakeLists.txt` (+12 -0) 📝 `CMakePresets.json` (+9 -0) 📝 `Dockerfile` (+22 -2) 📝 `discover/gpu.go` (+30 -1) 📝 `discover/runner.go` (+53 -4) 📝 `discover/types.go` (+3 -2) 📝 `envconfig/config.go` (+2 -0) 📝 `llama/llama.go` (+3 -1) ➕ `llama/patches/0027-vulkan-get-GPU-ID-ollama-v0.11.5.patch` (+95 -0) ➕ `llama/patches/0028-vulkan-pci-and-memory.patch` (+253 -0) 📝 `llm/server.go` (+1 -0) 📝 `ml/backend/ggml/ggml.go` (+7 -3) 📝 `ml/backend/ggml/ggml/.rsync-filter` (+4 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/CMakeLists.txt` (+211 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/ggml-vulkan.cpp` (+13903 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt` (+31 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/acc.comp` (+29 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/add.comp` (+69 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/add_id.comp` (+42 -0) _...and 80 more files_ </details> ### 📄 Description A pull request of https://github.com/ollama/ollama/pull/9650 with newest patches of main - [x] Automatic Merges and with AI probably won't work - is a draft to see what fails and what needs to be done to make it work. - [x] Needs some additional commits to revert unintended code changes. - [x] Needs Testung - [x] Test Windows - [x] AMD - [x] Intel - [x] Nvidia - [x] Test Linux - [x] Test Docker - [x] A quick look at the merged showed that some changes like in ggml.go need to be reverted. (Probably bad merges). - [x] Update Vulkan llama.cpp code. - [x] Update Vulkan when https://github.com/ollama/ollama/pull/11823 went in. - [x] Compiles on Windows - [x] Enable Flash Attention again - [x] It seems that it does not use Vulkan (It detects it but the execution seems to be still on cpu) (was missing lib folder) - [x] Learn how to build an release version of ollama (setup, zip) - [x] Intel Vulkan KV Cache Quantization does not work. - [x] gpt-oss does not work yet needs llama.cpp update - [x] fix Unit Tests - [x] Seems the Memory estimation is to Optimistic (Is fixed with the new estimation logic) - [x] Don't use mmap in Vulkan per default if nothing is specified. - [x] Enable Flash Attention when support (Not needed Flash attenation should be supported on all devices now. - [x] Update to newest master, my last merge was faulty and caused the vulkan backend not to work. Reset to https://github.com/ollama/ollama/pull/11835/commits/834a66689e365fa3f40569269c616c5bd1dd3938 which works. Known issues: - [x] Update vulkan when llama.cpp is synchronized to at least https://github.com/ggml-org/llama.cpp/pull/15334 Without this OLLAMA_KV_CACHE_TYPE=f16 has to be set or else the llama runner crashes. - [x] Fix gpu id on Windows https://github.com/MooreThreads/ollama-musa/commit/06b8c3c39484190185a4c5a5a16299ac784142f2 - [x] What happens when two backends are supported rocm and vulkan or nvidia and vulkan on the same gpu. (On llama.cpp sometimes vulkan and rocm are in the same Performance range for Nvidia I cannot say). Filter out Vulkan Devices where the Same Id already exists in ROCM or CUDA. https://github.com/ggml-org/llama.cpp/pull/15947 - [x] On AMD 7900 XTX Qwen3 shows as 100% on GPU but is really only partially calculatoed on GPU (Maybe vulkan update helps). - [x] Patch validation fails (make -f Makefile.sync clean checkout apply-patches sync) run under WSL. - [x] Make CI green - [x] Add Vulkan Build to Test Build Matrix The vulkan Builds ran sucessfully here https://github.com/inforithmics/ollama/pull/7 - [ ] Add Vulkan Build to Release Build Matrix - [x] Only select supported vulkan devices https://github.com/ggml-org/llama.cpp/pull/15976/files - [x] Review patches from here https://github.com/whyvl/ollama-vulkan/issues/7#issuecomment-2660836871 For example GGML_VK_VISIBLE_DEVICES - [x] Update from Main https://github.com/inforithmics/ollama/pull/9 (Vulkan detection seems not working yet) - [x] Vulkan coopmat1 Flash Attention Fix https://github.com/ggml-org/llama.cpp/pull/16365 - [x] Gemma3:12B fails on my IGPU with Fixed by newest llama.cpp update https://github.com/ollama/ollama/pull/12552 Version 12.5 [OllamaSetup.zip](https://github.com/user-attachments/files/22864449/OllamaSetup.zip) Build with build_windows.ps1: Some interesting Links: Vulkan vs ROCm on Linux: https://www.phoronix.com/review/llama-cpp-windows-linux/5 https://www.phoronix.com/review/amd-rocm-7-strix-halo/3 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:51:12 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#18902