[GH-ISSUE #13126] Ollama cannot properly use the GPU on a dual-GPU laptop after enabling Vulkan #70746

Open
opened 2026-05-04 22:49:14 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @Arondight on GitHub (Nov 18, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13126

Originally assigned to: @jessegross on GitHub.

What is the issue?

As shown in the picture, my laptop’s GPU0 is the Intel integrated graphics, and GPU1 is the NVIDIA MX330 graphics card, but Vulkan can only use the integrated graphics.

C:\Users\user>ollama ps
NAME        ID              SIZE      PROCESSOR    CONTEXT    UNTIL
qwen3:4b    359d7dd4bcda    3.5 GB    100% GPU     4096       2 minutes from now
Image Image

Relevant log output


OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.12.11

Originally created by @Arondight on GitHub (Nov 18, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13126 Originally assigned to: @jessegross on GitHub. ### What is the issue? As shown in the picture, my laptop’s GPU0 is the Intel integrated graphics, and GPU1 is the NVIDIA MX330 graphics card, but Vulkan can only use the integrated graphics. ``` C:\Users\user>ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL qwen3:4b 359d7dd4bcda 3.5 GB 100% GPU 4096 2 minutes from now ``` <img width="1390" height="977" alt="Image" src="https://github.com/user-attachments/assets/c53a6375-cb1e-4815-b840-093c9658a41b" /> <img width="972" height="932" alt="Image" src="https://github.com/user-attachments/assets/947285fe-a529-4076-beb7-43255f8c6cd8" /> ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.11
GiteaMirror added the vulkanbug labels 2026-05-04 22:49:14 -05:00
Author
Owner

@dhiltgen commented on GitHub (Nov 18, 2025):

Please start the server with OLLAMA_DEBUG="2" set and share your startup logs so we can see what's going wrong. It should favor the discrete GPU over an iGPU.

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx

<!-- gh-comment-id:3549667207 --> @dhiltgen commented on GitHub (Nov 18, 2025): Please start the server with OLLAMA_DEBUG="2" set and share your startup logs so we can see what's going wrong. It should favor the discrete GPU over an iGPU. https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx
Author
Owner

@Arondight commented on GitHub (Nov 19, 2025):

Please start the server with OLLAMA_DEBUG="2" set and share your startup logs so we can see what's going wrong. It should favor the discrete GPU over an iGPU.

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx

thanks for you reply, this is my ollama log with qwen3:1.7b.

C:\Users\user>ollama ps
NAME          ID              SIZE      PROCESSOR    CONTEXT    UNTIL
qwen3:1.7b    8f68893c685c    1.9 GB    100% GPU     4096       About a minute from now

C:\Users\user>nvidia-smi
Wed Nov 19 18:34:01 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 573.57                 Driver Version: 573.57         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce MX330         WDDM  |   00000000:2D:00.0 Off |                  N/A |
| N/A   59C    P8            N/A  / 5001W |       0MiB /   2048MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2848    C+G   ...ram Files\Tencent\QQNT\QQ.exe      N/A      |
|    0   N/A  N/A           12388    C+G   ...al\Programs\Ollama\ollama.exe      N/A      |
+-----------------------------------------------------------------------------------------+

CWINDOWSsystem32cmd.exe - ollama serve.txt

<!-- gh-comment-id:3551983732 --> @Arondight commented on GitHub (Nov 19, 2025): > Please start the server with OLLAMA_DEBUG="2" set and share your startup logs so we can see what's going wrong. It should favor the discrete GPU over an iGPU. > > https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx thanks for you reply, this is my ollama log with `qwen3:1.7b`. ``` C:\Users\user>ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL qwen3:1.7b 8f68893c685c 1.9 GB 100% GPU 4096 About a minute from now C:\Users\user>nvidia-smi Wed Nov 19 18:34:01 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 573.57 Driver Version: 573.57 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce MX330 WDDM | 00000000:2D:00.0 Off | N/A | | N/A 59C P8 N/A / 5001W | 0MiB / 2048MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2848 C+G ...ram Files\Tencent\QQNT\QQ.exe N/A | | 0 N/A N/A 12388 C+G ...al\Programs\Ollama\ollama.exe N/A | +-----------------------------------------------------------------------------------------+ ``` [CWINDOWSsystem32cmd.exe - ollama serve.txt](https://github.com/user-attachments/files/23625341/CWINDOWSsystem32cmd.exe.-.ollama.serve.txt)
Author
Owner

@jessegross commented on GitHub (Nov 19, 2025):

We don't currently allow a single model to be loaded across multiple GPU libraries - CUDA and Vulkan in this case. Even though we try to favor discrete GPUs that is within a single library. In the end, the Vulkan library is favored because it allows offloading more of the model.

It's probably OK to span libraries in the case of iGPUs but we'll need to do more testing to see if that's actually a good idea.

<!-- gh-comment-id:3554792576 --> @jessegross commented on GitHub (Nov 19, 2025): We don't currently allow a single model to be loaded across multiple GPU libraries - CUDA and Vulkan in this case. Even though we try to favor discrete GPUs that is within a single library. In the end, the Vulkan library is favored because it allows offloading more of the model. It's probably OK to span libraries in the case of iGPUs but we'll need to do more testing to see if that's actually a good idea.
Author
Owner

@d-shehu commented on GitHub (Dec 18, 2025):

I'm also running an issue loading a larger model on 2 identical Radeon GPUs using Vulkan. Is that also not supported with Ollama?

<!-- gh-comment-id:3668491283 --> @d-shehu commented on GitHub (Dec 18, 2025): I'm also running an issue loading a larger model on 2 identical Radeon GPUs using Vulkan. Is that also not supported with Ollama?
Author
Owner

@getter3 commented on GitHub (Jan 10, 2026):

same here, i am using RTX 5070 8G VRAM and a integerated Radeon GPU 890M

<!-- gh-comment-id:3732135721 --> @getter3 commented on GitHub (Jan 10, 2026): same here, i am using RTX 5070 8G VRAM and a integerated Radeon GPU 890M
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70746