[GH-ISSUE #12366] run gemma3:12b Error: 500 Internal Server Error: llama runner process has terminated #70274

Closed
opened 2026-05-04 20:55:06 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @1574802103 on GitHub (Sep 22, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12366

What is the issue?

C:\Users\zhang>ollama run gemma3:12b
Error: 500 Internal Server Error: llama runner process has terminated: CUDA errorC:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error

Relevant log output


OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.12.0

Originally created by @1574802103 on GitHub (Sep 22, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12366 ### What is the issue? C:\Users\zhang>ollama run gemma3:12b Error: 500 Internal Server Error: llama runner process has terminated: CUDA errorC:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.0
GiteaMirror added the bug label 2026-05-04 20:55:06 -05:00
Author
Owner

@1574802103 commented on GitHub (Sep 22, 2025):

C:\Users\zhang>ollama run gemma3:12b
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: invalid argument

C:\Users\zhang>ollama run gemma3:12b
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: invalid argument
current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667
current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667
cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2))
CUDA error: invalid argument
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error
cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2))
current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error
cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2))
CUDA error: invalid argument
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error
current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667
cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2))
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error
CUDA error: invalid argument
current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667
cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2))
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error

<!-- gh-comment-id:3317628395 --> @1574802103 commented on GitHub (Sep 22, 2025): C:\Users\zhang>ollama run gemma3:12b Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: invalid argument C:\Users\zhang>ollama run gemma3:12b Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: invalid argument current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667 current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667 cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2)) CUDA error: invalid argument C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2)) current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667 C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2)) CUDA error: invalid argument C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667 cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2)) C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error CUDA error: invalid argument current device: 0, in function ggml_backend_cuda_buffer_set_tensor at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:667 cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, ((cudaStream_t)0x2)) C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:84: CUDA error
Author
Owner

@therealkenc commented on GitHub (Sep 22, 2025):

"Same here"

Adding this env fixed it for me: OLLAMA_LLM_LIBRARY=cuda_v13

<!-- gh-comment-id:3317866909 --> @therealkenc commented on GitHub (Sep 22, 2025): "Same here" Adding this env fixed it for me: `OLLAMA_LLM_LIBRARY=cuda_v13`
Author
Owner

@jessegross commented on GitHub (Sep 22, 2025):

Can you please post the full log with OLLAMA_DEBUG=1?

If setting OLLAMA_LLM_LIBRARY solves the issue then it might be the same as https://github.com/ollama/ollama/issues/11211, which is caused overlapping installations of old and new versions and can be solved by a full reinstall.

<!-- gh-comment-id:3320667607 --> @jessegross commented on GitHub (Sep 22, 2025): Can you please post the full log with OLLAMA_DEBUG=1? If setting OLLAMA_LLM_LIBRARY solves the issue then it might be the same as https://github.com/ollama/ollama/issues/11211, which is caused overlapping installations of old and new versions and can be solved by a full reinstall.
Author
Owner

@1574802103 commented on GitHub (Sep 23, 2025):

卸载ollama 重新装就好了 感谢 jessegross

<!-- gh-comment-id:3322051429 --> @1574802103 commented on GitHub (Sep 23, 2025): 卸载ollama 重新装就好了 感谢 [jessegross](https://github.com/jessegross)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70274