[GH-ISSUE #6836] CUDA error #4317

Closed
opened 2026-04-12 15:14:36 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @harshallakare on GitHub (Sep 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6836

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I'm getting this strange error. It was working fine untill last couple of days and now i'm getting this error.

root@vm01:/var/log# ollama run gemma2:27b
Error: llama runner process has terminated: CUDA error: unspecified launch failure
current device: 0, in function ggml_cuda_compute_forward at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326
err
/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error
root@vm01:/var/log#

I also obsereved that , sometime it gave me output but get terminated in between.

root@vm01:/var/log# nvidia-smi
Tue Sep 17 12:15:02 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GRID A100D-40C On | 00000000:06:00.0 On | N/A |
| N/A N/A P0 N/A / N/A | 3096MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1406406 G /usr/lib/xorg/Xorg 90MiB |
| 0 N/A N/A 3276533 G /usr/lib/xorg/Xorg 93MiB |
| 0 N/A N/A 3276582 G /usr/bin/gnome-shell 43MiB |
| 0 N/A N/A 4047176 C uwsgi 2837MiB |
+-----------------------------------------------------------------------------+
root@vm01:/var/log#

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.10

Originally created by @harshallakare on GitHub (Sep 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6836 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I'm getting this strange error. It was working fine untill last couple of days and now i'm getting this error. root@vm01:/var/log# ollama run gemma2:27b Error: llama runner process has terminated: CUDA error: unspecified launch failure current device: 0, in function ggml_cuda_compute_forward at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:2326 err /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:102: CUDA error root@vm01:/var/log# I also obsereved that , sometime it gave me output but get terminated in between. root@vm01:/var/log# nvidia-smi Tue Sep 17 12:15:02 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID A100D-40C On | 00000000:06:00.0 On | N/A | | N/A N/A P0 N/A / N/A | 3096MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1406406 G /usr/lib/xorg/Xorg 90MiB | | 0 N/A N/A 3276533 G /usr/lib/xorg/Xorg 93MiB | | 0 N/A N/A 3276582 G /usr/bin/gnome-shell 43MiB | | 0 N/A N/A 4047176 C uwsgi 2837MiB | +-----------------------------------------------------------------------------+ root@vm01:/var/log# ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.10
GiteaMirror added the linuxnvidiabug labels 2026-04-12 15:14:36 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 18, 2024):

Server logs may aid in debugging.

<!-- gh-comment-id:2357228195 --> @rick-github commented on GitHub (Sep 18, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@harshallakare commented on GitHub (Sep 18, 2024):

ollama.log

Sorry i missed to upload those logs last time.

<!-- gh-comment-id:2357509376 --> @harshallakare commented on GitHub (Sep 18, 2024): [ollama.log](https://github.com/user-attachments/files/17038664/ollama.log) Sorry i missed to upload those logs last time.
Author
Owner

@rick-github commented on GitHub (Sep 18, 2024):

You have a couple of consistent error messages:

ggml_cuda_compute_forward: ROPE failed
ggml_cuda_compute_forward: MUL failed

A few of these are found in github, and the ones that have resolutions often get there by upgrading drivers. Yours are currently pretty old at 11.6, is upgrading an option?

<!-- gh-comment-id:2357605693 --> @rick-github commented on GitHub (Sep 18, 2024): You have a couple of consistent error messages: ``` ggml_cuda_compute_forward: ROPE failed ggml_cuda_compute_forward: MUL failed ``` A few of these are found in github, and the ones that have resolutions often get there by upgrading drivers. Yours are currently pretty old at 11.6, is upgrading an option?
Author
Owner

@Caspar15 commented on GitHub (Sep 19, 2024):

Possible reasons (unsure):
CUDA driver or library version mismatch: CUDA version is 11.6 and Nvidia driver version is 510.85.02. Check if CUDA and Nvidia drivers are compatible.

Insufficient GPU computing resources: According to nvidia-smi, the GPU you are using is A100D-40C, the current memory has used about 3096MiB, and the uwsgi process occupies a large amount of GPU memory (2837MiB).

GPU overheating or hardware failure

Ollama or Llama Runner issue: The error appears to be related to ollama run gemma2:27b.

You can try it:
Update Nvidia driver and CUDA: Try updating or reinstalling Nvidia driver and CUDA. Check if there is a newer version of CUDA and confirm driver compatibility with CUDA.

Release GPU memory: If the uwsgi process takes up too much GPU resources, try removing it or reducing its usage. You can use nvidia-smi to monitor GPU resource usage to ensure that sufficient GPU memory is available when executing an ollama run.

Retry the run or reduce the model size: Sometimes larger models cause resources to exceed limits, try testing with a smaller model to see if you can avoid CUDA errors.

Check the system log: Check /var/log/syslog or /var/log/kern.log for more detailed error messages, which can help better locate the source of the problem.

<!-- gh-comment-id:2360982147 --> @Caspar15 commented on GitHub (Sep 19, 2024): Possible reasons (unsure): CUDA driver or library version mismatch: CUDA version is 11.6 and Nvidia driver version is 510.85.02. Check if CUDA and Nvidia drivers are compatible. Insufficient GPU computing resources: According to nvidia-smi, the GPU you are using is A100D-40C, the current memory has used about 3096MiB, and the uwsgi process occupies a large amount of GPU memory (2837MiB). GPU overheating or hardware failure Ollama or Llama Runner issue: The error appears to be related to ollama run gemma2:27b. You can try it: Update Nvidia driver and CUDA: Try updating or reinstalling Nvidia driver and CUDA. Check if there is a newer version of CUDA and confirm driver compatibility with CUDA. Release GPU memory: If the uwsgi process takes up too much GPU resources, try removing it or reducing its usage. You can use nvidia-smi to monitor GPU resource usage to ensure that sufficient GPU memory is available when executing an ollama run. Retry the run or reduce the model size: Sometimes larger models cause resources to exceed limits, try testing with a smaller model to see if you can avoid CUDA errors. Check the system log: Check /var/log/syslog or /var/log/kern.log for more detailed error messages, which can help better locate the source of the problem.
Author
Owner

@dhiltgen commented on GitHub (Sep 25, 2024):

As mentioned above, your driver is more than 2 years old, so lets start with having you upgrade the driver to pick up numerous bug fixes in the newer drivers. If you're still having problems on your setup with a current driver, please share an updated server log and I'll reopen the issue.

<!-- gh-comment-id:2375240559 --> @dhiltgen commented on GitHub (Sep 25, 2024): As mentioned above, your driver is more than 2 years old, so lets start with having you upgrade the driver to pick up numerous bug fixes in the newer drivers. If you're still having problems on your setup with a current driver, please share an updated server log and I'll reopen the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4317