[GH-ISSUE #9165] Inconsistent results when mixing multiple nvidia gpu #5965

Open
opened 2026-04-12 17:18:58 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @galvanoid on GitHub (Feb 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9165

What is the issue?

I have this setup:

2 x rtx3060
1 x gtx1060

So far everything has worked perfectly, but I updated the Nvidia drivers to version 565 and I have the following problem.

If I load a model that fits one or both rtx3060s, it works fine, with consistent responses (especially in RAG).

But if I load a model that occupies both rtx3060s together with the gtx1060 (different model or the same one with more quants), then the responses are inconsistent and incoherent.

If I update the Nvidia drivers to version 570 (latest), the result is the same.

Note that every time I update the drivers, the CUDA version is also updated, so I don't know if it's something related to the drivers or the CUDA version.

However, when downgrading the driver to version 535 or 550, the responses are correct and coherent.

As I said, this is especially noticeable when using RAG (document collections).

Regards.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.5.11

Originally created by @galvanoid on GitHub (Feb 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9165 ### What is the issue? I have this setup: 2 x rtx3060 1 x gtx1060 So far everything has worked perfectly, but I updated the Nvidia drivers to version 565 and I have the following problem. If I load a model that fits one or both rtx3060s, it works fine, with consistent responses (especially in RAG). But if I load a model that occupies both rtx3060s together with the gtx1060 (different model or the same one with more quants), then the responses are inconsistent and incoherent. If I update the Nvidia drivers to version 570 (latest), the result is the same. Note that every time I update the drivers, the CUDA version is also updated, so I don't know if it's something related to the drivers or the CUDA version. However, when downgrading the driver to version 535 or 550, the responses are correct and coherent. As I said, this is especially noticeable when using RAG (document collections). Regards. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.11
GiteaMirror added the bug label 2026-04-12 17:18:58 -05:00
Author
Owner

@dnielso5 commented on GitHub (Jan 17, 2026):

Following as I have a 3060 and 1060 in my truenas and upgrading to 25.10 updates the driver to 570.

<!-- gh-comment-id:3762719800 --> @dnielso5 commented on GitHub (Jan 17, 2026): Following as I have a 3060 and 1060 in my truenas and upgrading to 25.10 updates the driver to 570.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5965