[GH-ISSUE #12276] Ollama v0.11.11-rc1 - CUDA error: invalid argument #70220

Closed
opened 2026-05-04 20:42:02 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @eXt73 on GitHub (Sep 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12276

What is the issue?

Ollama works under settings

Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KV_CACHE_TYPE=q4_0"
Environment="OLLAMA_NEW_ESTIMATES=1"
Environment="OLLAMA_NEW_ENGINE=1"

The model is Mistral-3.2-nov-2506-eXt73-tuned-v1.5-GGUF:Q6_K with the vision module removed and the parameters modified (hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q6_K).

NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0

It works flawlessly with ollama v0.11.10, but with v0.11.11-rc1 it receives the following errors:

ollama run Mistral-3.2-nov-2506-eXt73-tuned-v1.5-GGUF:Q6_K
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: invalid argument
current device: 0, in function ggml_backend_cuda_buffer_set_tensor at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:667
cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2))
//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:84: CUDA error
ext73@ext73-kernel:~$

Image

Relevant log output


OS

Kubuntu Linux

GPU

RTX 5090

CPU

AMD Ryzen 9950x3D

Ollama version

v0.11.11-rc1

Originally created by @eXt73 on GitHub (Sep 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12276 ### What is the issue? Ollama works under settings Environment="OLLAMA_FLASH_ATTENTION=1" Environment="OLLAMA_KV_CACHE_TYPE=q4_0" Environment="OLLAMA_NEW_ESTIMATES=1" Environment="OLLAMA_NEW_ENGINE=1" The model is Mistral-3.2-nov-2506-eXt73-tuned-v1.5-GGUF:Q6_K with the vision module removed and the parameters modified (hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q6_K). NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 It works flawlessly with ollama v0.11.10, but with v0.11.11-rc1 it receives the following errors: ollama run Mistral-3.2-nov-2506-eXt73-tuned-v1.5-GGUF:Q6_K Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: invalid argument current device: 0, in function ggml_backend_cuda_buffer_set_tensor at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:667 cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2)) //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:84: CUDA error ext73@ext73-kernel:~$ <img width="3840" height="2160" alt="Image" src="https://github.com/user-attachments/assets/388c28ad-400b-473b-aa10-7882c03fd81e" /> ### Relevant log output ```shell ``` ### OS Kubuntu Linux ### GPU RTX 5090 ### CPU AMD Ryzen 9950x3D ### Ollama version v0.11.11-rc1
GiteaMirror added the bug label 2026-05-04 20:42:02 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 12, 2025):

A full text log will include a stack trace that may help with debugging. Details of the fine-tuning may also be useful, as would access to the model.

<!-- gh-comment-id:3286942677 --> @rick-github commented on GitHub (Sep 12, 2025): A full text [log](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will include a stack trace that may help with debugging. Details of the fine-tuning may also be useful, as would access to the model.
Author
Owner

@eXt73 commented on GitHub (Sep 13, 2025):

Interestingly, this happens on every model that works with the new engine - for example, when the standard Mistarl is started with the old engine, there is no problem... and what's strange is that I don't see anything in the logs > it doesn't generate them when listening > sudo journalctl -u ollama --follow

ext73@ext73-kernel:$ ollama ls
NAME ID SIZE MODIFIED
Mistral-3.2-nov-2506-eXt73-tuned-v1.5-GGUF:Q4 754d5d5cb6b8 14 GB 45 minutes ago
hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:latest 39224c6559c9 15 GB 28 hours ago
hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q6_K 4206c4a9ec35 20 GB 2 days ago
embeddinggemma:latest 85462619ee72 621 MB 7 days ago
hf.co/bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF:Q8_0 67fab01b4791 4.3 GB 5 weeks ago
ext73@ext73-kernel:
$ ollama run hf.co/bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF:Q8_0
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: context is destroyed
current device: 0, in function ggml_backend_cuda_buffer_set_tensor at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:667
cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2))
//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:84: CUDA error
ext73@ext73-kernel:~$ ollama run hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:latest

Image
<!-- gh-comment-id:3288958144 --> @eXt73 commented on GitHub (Sep 13, 2025): Interestingly, this happens on every model that works with the new engine - for example, when the standard Mistarl is started with the old engine, there is no problem... and what's strange is that I don't see anything in the logs > it doesn't generate them when listening > sudo journalctl -u ollama --follow ext73@ext73-kernel:~$ ollama ls NAME ID SIZE MODIFIED Mistral-3.2-nov-2506-eXt73-tuned-v1.5-GGUF:Q4 754d5d5cb6b8 14 GB 45 minutes ago hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:latest 39224c6559c9 15 GB 28 hours ago hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:Q6_K 4206c4a9ec35 20 GB 2 days ago embeddinggemma:latest 85462619ee72 621 MB 7 days ago hf.co/bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF:Q8_0 67fab01b4791 4.3 GB 5 weeks ago ext73@ext73-kernel:~$ ollama run hf.co/bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF:Q8_0 Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: context is destroyed current device: 0, in function ggml_backend_cuda_buffer_set_tensor at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:667 cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2)) //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:84: CUDA error ext73@ext73-kernel:~$ ollama run hf.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF:latest <img width="1632" height="1287" alt="Image" src="https://github.com/user-attachments/assets/2953151f-d063-4bf0-8729-8b4cc8f24be0" />
Author
Owner

@alsavu commented on GitHub (Sep 16, 2025):

I am having the exact same issue but not with rc1 of 0.11.11, with the full final release, I updated today and I am having the same issue, any progress on this ?

<!-- gh-comment-id:3297417480 --> @alsavu commented on GitHub (Sep 16, 2025): I am having the exact same issue but not with rc1 of 0.11.11, with the full final release, I updated today and I am having the same issue, any progress on this ?
Author
Owner

@rick-github commented on GitHub (Sep 16, 2025):

https://github.com/ollama/ollama/issues/12276#issuecomment-3286942677

<!-- gh-comment-id:3297659113 --> @rick-github commented on GitHub (Sep 16, 2025): https://github.com/ollama/ollama/issues/12276#issuecomment-3286942677
Author
Owner

@eXt73 commented on GitHub (Sep 16, 2025):

I can confirm that this problem no longer occurs in my case under stable 0.11.11 - it beautifully sets the 'neutered' Mistral from the vision module on the new engine :)

Image
<!-- gh-comment-id:3300468209 --> @eXt73 commented on GitHub (Sep 16, 2025): I can confirm that this problem no longer occurs in my case under stable 0.11.11 - it beautifully sets the 'neutered' Mistral from the vision module on the new engine :) <img width="3840" height="2160" alt="Image" src="https://github.com/user-attachments/assets/1e77fb35-ddaa-40e9-b28a-1b14b88dc0cd" />
Author
Owner

@alsavu commented on GitHub (Sep 18, 2025):

I also can confirm this problem no longer occurs for me as well, as I reinstalled version 0.11.11 and it works very well, I tested more than 30 models and I had no issues, thank you

<!-- gh-comment-id:3306410700 --> @alsavu commented on GitHub (Sep 18, 2025): I also can confirm this problem no longer occurs for me as well, as I reinstalled version 0.11.11 and it works very well, I tested more than 30 models and I had no issues, thank you
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70220