[PR #5059] [CLOSED] Add Vulkan support to ollama #11665

Closed
opened 2026-04-12 23:35:10 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/5059
Author: @ghost
Created: 6/15/2024
Status: Closed

Base: mainHead: vulkan


📝 Commits (10+)

📊 Changes

86 files changed (+15105 additions, -2 deletions)

View changed files

📝 CMakeLists.txt (+13 -0)
📝 Makefile.sync (+1 -1)
📝 discover/gpu.go (+112 -1)
📝 discover/gpu_info.h (+1 -0)
discover/gpu_info_vulkan.c (+228 -0)
discover/gpu_info_vulkan.h (+66 -0)
📝 discover/gpu_linux.go (+18 -0)
📝 discover/gpu_windows.go (+9 -0)
📝 discover/types.go (+7 -0)
discover/vulkan_common.go (+19 -0)
📝 envconfig/config.go (+2 -0)
📝 ml/backend/ggml/ggml/.rsync-filter (+3 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/CMakeLists.txt (+92 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/ggml-vulkan.cpp (+8745 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt (+9 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/acc.comp (+29 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/add.comp (+29 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/argsort.comp (+69 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/clamp.comp (+17 -0)
ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/concat.comp (+41 -0)

...and 66 more files

📄 Description

Edit: (2025/01/19)

It's been around 7 months and ollama devs don't seem to be interested in merging this PR. I'll maintain this fork as a separate project from now on. If you have any issues please raise them in the fork's repo so I can keep track of them.

This PR adds vulkan support to ollama with a proper memory monitoring implementation. This closes #2033 and replaces #2578 which does not implement proper memory monitoring.

Note that this implementation does not support GPU without VkPhysicalDeviceMemoryBudgetPropertiesEXT support. This shouldn't be a problem since on Linux the mesa driver supports it for all Intel devices afaik.

CAP_PERFMON capability is also needed for memory monitoring. This can be done by specifically enabling CAP_PERFMON when running ollama as a systemd service by adding AmbientCapabilities=CAP_PERFMON to the service or just run ollama as root.

Vulkan devices that are CPUs under the hood (e.g. llvmpipe) are also not supported. This is purposely done so to avoid accidentally using CPUs for accelerated inference. Let me know if you think this behavior should be changed.

I've not tested this on Windows nor have I implemented the logic for building ollama with Vulkan support yet because I don't use Windows. If someone can help me with this that would be great.

I've tested this on my machine with an Intel Arc A770:

System:
  Host: rofl Kernel: 6.8.11 arch: x86_64 bits: 64 compiler: gcc v: 13.2.0
  Console: pty pts/2 Distro: NixOS 24.05 (Uakari)
CPU:
  Info: 8-core (4-mt/4-st) model: Intel 0000 bits: 64 type: MST AMCP arch: Raptor Lake rev: 2
    cache: L1: 704 KiB L2: 7 MiB L3: 12 MiB
  Speed (MHz): avg: 473 high: 1100 min/max: 400/4500:3400 cores: 1: 400 2: 400 3: 400 4: 576
    5: 400 6: 400 7: 400 8: 400 9: 400 10: 400 11: 1100 12: 400 bogomips: 59904
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3
Graphics:
  Device-1: Intel DG2 [Arc A770] vendor: Acer Incorporated ALI driver: i915 v: kernel
    arch: Gen-12.7 pcie: speed: 2.5 GT/s lanes: 1 ports: active: DP-1 empty: DP-2, DP-3, DP-4,
    HDMI-A-1, HDMI-A-2, HDMI-A-3 bus-ID: 03:00.0 chip-ID: 8086:56a0
  Display: server: No display server data found. Headless machine? tty: 98x63
  Monitor-1: DP-1 model: Daewoo HDMI res: 1024x600 dpi: 55 diag: 537mm (21.1")
  API: Vulkan v: 1.3.283 surfaces: N/A device: 0 type: discrete-gpu driver: N/A
    device-ID: 8086:56a0 device: 1 type: cpu driver: N/A device-ID: 10005:0000

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/5059 **Author:** [@ghost](https://github.com/ghost) **Created:** 6/15/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `vulkan` --- ### 📝 Commits (10+) - [`f46b4a6`](https://github.com/ollama/ollama/commit/f46b4a6fa263d7cf51bc8f3ceb2a69d2c1e83fdd) implement the vulkan C backend - [`9c6b049`](https://github.com/ollama/ollama/commit/9c6b0495678f66f5b6b50fdb05c7efd99f5a208f) add support in gpu.go - [`93c4d69`](https://github.com/ollama/ollama/commit/93c4d69daa02be2c4407c73d30c8fe72961de61b) add support in gen_linux.sh - [`24c8840`](https://github.com/ollama/ollama/commit/24c8840037a9edd48fafd31f113916cb4105c922) it builds - [`724fac4`](https://github.com/ollama/ollama/commit/724fac470f0df86e8d0d24e209bea34f31a4ec84) fix segfault - [`e4e8a5d`](https://github.com/ollama/ollama/commit/e4e8a5d25a375c9df03ad122211237798e4ca743) fix compilation - [`257364c`](https://github.com/ollama/ollama/commit/257364cb3c47a5e392bfb1772ecf6709dc0a7c83) fix free memory monitor - [`11c55fa`](https://github.com/ollama/ollama/commit/11c55fab8113a02fbd77968c99856c22fb89c880) fix total memory monitor - [`e77ea68`](https://github.com/ollama/ollama/commit/e77ea68e114022df303ead281915efd86ed31006) Merge branch 'refs/heads/main' into vulkan - [`18f3f96`](https://github.com/ollama/ollama/commit/18f3f960b01e1dd18a43fbcddbc0dc9de1ae2cbd) update gpu.go ### 📊 Changes **86 files changed** (+15105 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `CMakeLists.txt` (+13 -0) 📝 `Makefile.sync` (+1 -1) 📝 `discover/gpu.go` (+112 -1) 📝 `discover/gpu_info.h` (+1 -0) ➕ `discover/gpu_info_vulkan.c` (+228 -0) ➕ `discover/gpu_info_vulkan.h` (+66 -0) 📝 `discover/gpu_linux.go` (+18 -0) 📝 `discover/gpu_windows.go` (+9 -0) 📝 `discover/types.go` (+7 -0) ➕ `discover/vulkan_common.go` (+19 -0) 📝 `envconfig/config.go` (+2 -0) 📝 `ml/backend/ggml/ggml/.rsync-filter` (+3 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/CMakeLists.txt` (+92 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/ggml-vulkan.cpp` (+8745 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt` (+9 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/acc.comp` (+29 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/add.comp` (+29 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/argsort.comp` (+69 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/clamp.comp` (+17 -0) ➕ `ml/backend/ggml/ggml/src/ggml-vulkan/vulkan-shaders/concat.comp` (+41 -0) _...and 66 more files_ </details> ### 📄 Description # Edit: (2025/01/19) It's been around 7 months and ollama devs don't seem to be interested in merging this PR. I'll maintain this fork as a separate project from now on. If you have any issues please raise them in the fork's repo so I can keep track of them. # This PR adds vulkan support to ollama with a proper memory monitoring implementation. This closes #2033 and replaces #2578 which does not implement proper memory monitoring. Note that this implementation does not support GPU without `VkPhysicalDeviceMemoryBudgetPropertiesEXT` support. This shouldn't be a problem since on Linux the mesa driver supports it for all Intel devices afaik. `CAP_PERFMON` capability is also needed for memory monitoring. This can be done by specifically enabling `CAP_PERFMON` when running ollama as a systemd service by adding `AmbientCapabilities=CAP_PERFMON` to the service or just run ollama as root. Vulkan devices that are CPUs under the hood (e.g. llvmpipe) are also not supported. This is purposely done so to avoid accidentally using CPUs for accelerated inference. Let me know if you think this behavior should be changed. I've not tested this on Windows nor have I implemented the logic for building ollama with Vulkan support yet because I don't use Windows. If someone can help me with this that would be great. I've tested this on my machine with an Intel Arc A770: ``` System: Host: rofl Kernel: 6.8.11 arch: x86_64 bits: 64 compiler: gcc v: 13.2.0 Console: pty pts/2 Distro: NixOS 24.05 (Uakari) CPU: Info: 8-core (4-mt/4-st) model: Intel 0000 bits: 64 type: MST AMCP arch: Raptor Lake rev: 2 cache: L1: 704 KiB L2: 7 MiB L3: 12 MiB Speed (MHz): avg: 473 high: 1100 min/max: 400/4500:3400 cores: 1: 400 2: 400 3: 400 4: 576 5: 400 6: 400 7: 400 8: 400 9: 400 10: 400 11: 1100 12: 400 bogomips: 59904 Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 Graphics: Device-1: Intel DG2 [Arc A770] vendor: Acer Incorporated ALI driver: i915 v: kernel arch: Gen-12.7 pcie: speed: 2.5 GT/s lanes: 1 ports: active: DP-1 empty: DP-2, DP-3, DP-4, HDMI-A-1, HDMI-A-2, HDMI-A-3 bus-ID: 03:00.0 chip-ID: 8086:56a0 Display: server: No display server data found. Headless machine? tty: 98x63 Monitor-1: DP-1 model: Daewoo HDMI res: 1024x600 dpi: 55 diag: 537mm (21.1") API: Vulkan v: 1.3.283 surfaces: N/A device: 0 type: discrete-gpu driver: N/A device-ID: 8086:56a0 device: 1 type: cpu driver: N/A device-ID: 10005:0000 ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:35:10 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11665