[PR #2578] [CLOSED] First attempt at Vulkan: WIP, do not merge #42183

Closed
opened 2026-04-24 21:58:10 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/2578
Author: @ddpasa
Created: 2/18/2024
Status: Closed

Base: mainHead: vulkan


📝 Commits (1)

📊 Changes

3 files changed (+35 additions, -1 deletions)

View changed files

📝 gpu/gpu.go (+8 -1)
gpu/vulkan.go (+21 -0)
vulkan_install.sh (+6 -0)

📄 Description

This is a very preliminary implementation hack of Vulkan support, which llama.cpp recently added.

This is not intended to be merged. This code is far from there. I just want to get feedback from ollama devs and some pointers.

I tested this on an Intel Iris Plus G7 GPU on Linux. Phi-2 works fine with 20%-50% speedup compared to CPU with VNNI enabled. It behaves incorrectly for multimodal models such as Bakllava and the output is always empty, which I'm still debugging.

I think I need to pull the latest llama.cpp commits to make it work properly, but updating the submodule is throwing bizarre compile time errors.

Discussion in: https://github.com/ollama/ollama/issues/2396


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/2578 **Author:** [@ddpasa](https://github.com/ddpasa) **Created:** 2/18/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `vulkan` --- ### 📝 Commits (1) - [`19f08d3`](https://github.com/ollama/ollama/commit/19f08d3b14025455a701bacf6ae35eba41e64eea) First attempt at Vulkan ### 📊 Changes **3 files changed** (+35 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `gpu/gpu.go` (+8 -1) ➕ `gpu/vulkan.go` (+21 -0) ➕ `vulkan_install.sh` (+6 -0) </details> ### 📄 Description This is a very preliminary ~~implementation~~ hack of Vulkan support, which llama.cpp recently added. This is not intended to be merged. This code is far from there. I just want to get feedback from ollama devs and some pointers. I tested this on an Intel Iris Plus G7 GPU on Linux. Phi-2 works fine with 20%-50% speedup compared to CPU with VNNI enabled. It behaves incorrectly for multimodal models such as Bakllava and the output is always empty, which I'm still debugging. I think I need to pull the latest llama.cpp commits to make it work properly, but updating the submodule is throwing bizarre compile time errors. Discussion in: https://github.com/ollama/ollama/issues/2396 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-24 21:58:10 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#42183