[GH-ISSUE #5622] ollama run glm4 error - CUBLAS_STATUS_NOT_INITIALIZED #50018

Closed
opened 2026-04-28 13:50:09 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @SunMacArenas on GitHub (Jul 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5622

What is the issue?

[root@hanadev system]# ollama run glm4
Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826
cublasCreate_v2(&cublas_handles[device])
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu💯 !"CUDA error"

NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.21

Originally created by @SunMacArenas on GitHub (Jul 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5622 ### What is the issue? [root@hanadev system]# ollama run glm4 Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error" NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.21
GiteaMirror added the memorybugnvidia labels 2026-04-28 13:50:10 -05:00
Author
Owner

@linxOD commented on GitHub (Jul 11, 2024):

Ollama docker image v1.4.7 works normal.
GPU: Tesla V100-PCIE-32GB
Nvidia Toolkit: V12.5

Ollama via latest docker image. Similar or same issue here:

/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/template-instances/../mmq.cuh:2422: ERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 700. ggml-cuda.cu was compCUDA error: unspecified launch failure
  current device: 0, in function ggml_cuda_op_mul_mat at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1606
  cudaGetLastError()
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"
iled for: __CUDA_ARCH_LIST__
<!-- gh-comment-id:2222408854 --> @linxOD commented on GitHub (Jul 11, 2024): Ollama docker image v1.4.7 works normal. GPU: Tesla V100-PCIE-32GB Nvidia Toolkit: V12.5 Ollama via latest docker image. Similar or same issue here: ```bash /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/template-instances/../mmq.cuh:2422: ERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 700. ggml-cuda.cu was compCUDA error: unspecified launch failure current device: 0, in function ggml_cuda_op_mul_mat at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1606 cudaGetLastError() GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error" iled for: __CUDA_ARCH_LIST__ ```
Author
Owner

@harrytong commented on GitHub (Jul 21, 2024):

Recently I got the same error

ollama version is 0.2.7

It only core dumps when I try

ollama run deepseek-v2:236b

Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826
cublasCreate_v2(&cublas_handles[device])
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu💯 !"CUDA error"

If I try a smaller model from the same vendor, no problem

ollama run deepseek-v2:16b

OS: Ubuntu 22.04 LTS
GPU: Nvidia

# nvidia-smi
Sun Jul 21 07:39:28 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:04:00.0 Off |                  N/A |
|  0%   42C    P8             19W /  350W |      13MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:84:00.0 Off |                  N/A |
|  0%   38C    P8             22W /  350W |      13MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2276      G   /usr/lib/xorg/Xorg                              4MiB |
|    1   N/A  N/A      2276      G   /usr/lib/xorg/Xorg                              4MiB |
+-----------------------------------------------------------------------------------------+
<!-- gh-comment-id:2241578705 --> @harrytong commented on GitHub (Jul 21, 2024): Recently I got the same error ollama version is 0.2.7 It only core dumps when I try ollama run deepseek-v2:236b Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error" If I try a smaller model from the same vendor, no problem ollama run deepseek-v2:16b OS: Ubuntu 22.04 LTS GPU: Nvidia ``` # nvidia-smi Sun Jul 21 07:39:28 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:04:00.0 Off | N/A | | 0% 42C P8 19W / 350W | 13MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3090 Off | 00000000:84:00.0 Off | N/A | | 0% 38C P8 22W / 350W | 13MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2276 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 2276 G /usr/lib/xorg/Xorg 4MiB | +-----------------------------------------------------------------------------------------+ ```
Author
Owner

@harrytong commented on GitHub (Jul 21, 2024):

It used to work a few weeks ago.

ollama run deepseek-v2:236b

<!-- gh-comment-id:2241579002 --> @harrytong commented on GitHub (Jul 21, 2024): It used to work a few weeks ago. ollama run deepseek-v2:236b
Author
Owner

@dhiltgen commented on GitHub (Jul 23, 2024):

@SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log?

@harrytong can you share the ollama ps output from your system on the older version that worked, along with nvidia-smi output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong.

<!-- gh-comment-id:2246438689 --> @dhiltgen commented on GitHub (Jul 23, 2024): @SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log? @harrytong can you share the `ollama ps` output from your system on the older version that worked, along with `nvidia-smi` output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong.
Author
Owner

@harrytong commented on GitHub (Jul 24, 2024):

Hi Daniel,
Unfortunately I cannot bring back my old configuration. I don't know if it was CUDA 12.5.1 update, and/or Nvidia 555 driver. 
Right now the only way I can run ollama run deepseek-v2:236b is to unplug my two GTX 3090, and let my dual XEON 72 cores do the inference (much slower than when my 2 RTX 3090 can participate)
I have a dual XEON CPU with 256GB RAM, dual RTX3090  (total 48GB GPU RAM).
Here is my current nvidia-smi output
@.:/home/harry# nvidia-smiTue Jul 23 20:38:10 2024       +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     ||-----------------------------------------+------------------------+----------------------+| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC || Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. ||                                         |                        |               MIG M. ||=========================================+========================+======================||   0  NVIDIA GeForce RTX 3090        Off |   00000000:04:00.0 Off |                  N/A ||  0%   31C    P8              8W /  350W |      18MiB /  24576MiB |      0%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------+|   1  NVIDIA GeForce RTX 3090        Off |   00000000:84:00.0 Off |                  N/A ||  0%   32C    P8             10W /  350W |      18MiB /  24576MiB |      0%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------+                                                                                         +-----------------------------------------------------------------------------------------+| Processes:                                                                              ||  GPU   GI   CI        PID   Type   Process name                              GPU Memory ||        ID   ID                                                               Usage      ||=========================================================================================||    0   N/A  N/A      2585      G   /usr/lib/xorg/Xorg                              4MiB ||    1   N/A  N/A      2585      G   /usr/lib/xorg/Xorg                              4MiB @.:/home/harry# 
here is my ollama version 
@.***:/home/harry# ollama -vollama version is 0.2.8
BTW, my current hardware and software configuration, it can run the meta llama 3.1:405b locally without issue. It can also run deepseek-v2:latest (16b) without issue. It only fails when it tries to run deepseek-v2:236b
-Harry

On Tuesday, July 23, 2024 at 06:55:04 PM EDT, Daniel Hiltgen ***@***.***> wrote:  

@SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log?

@harrytong can you share the ollama ps output from your system on the older version that worked, along with nvidia-smi output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: @.***>

<!-- gh-comment-id:2246639324 --> @harrytong commented on GitHub (Jul 24, 2024): Hi Daniel, Unfortunately I cannot bring back my old configuration. I don't know if it was CUDA 12.5.1 update, and/or Nvidia 555 driver.  Right now the only way I can run ollama run deepseek-v2:236b is to unplug my two GTX 3090, and let my dual XEON 72 cores do the inference (much slower than when my 2 RTX 3090 can participate) I have a dual XEON CPU with 256GB RAM, dual RTX3090  (total 48GB GPU RAM). Here is my current nvidia-smi output ***@***.***:/home/harry# nvidia-smiTue Jul 23 20:38:10 2024       +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     ||-----------------------------------------+------------------------+----------------------+| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC || Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. ||                                         |                        |               MIG M. ||=========================================+========================+======================||   0  NVIDIA GeForce RTX 3090        Off |   00000000:04:00.0 Off |                  N/A ||  0%   31C    P8              8W /  350W |      18MiB /  24576MiB |      0%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------+|   1  NVIDIA GeForce RTX 3090        Off |   00000000:84:00.0 Off |                  N/A ||  0%   32C    P8             10W /  350W |      18MiB /  24576MiB |      0%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------+                                                                                         +-----------------------------------------------------------------------------------------+| Processes:                                                                              ||  GPU   GI   CI        PID   Type   Process name                              GPU Memory ||        ID   ID                                                               Usage      ||=========================================================================================||    0   N/A  N/A      2585      G   /usr/lib/xorg/Xorg                              4MiB ||    1   N/A  N/A      2585      G   /usr/lib/xorg/Xorg                              4MiB ***@***.***:/home/harry#  here is my ollama version  ***@***.***:/home/harry# ollama -vollama version is 0.2.8 BTW, my current hardware and software configuration, it can run the meta llama 3.1:405b locally without issue. It can also run deepseek-v2:latest (16b) without issue. It only fails when it tries to run deepseek-v2:236b -Harry On Tuesday, July 23, 2024 at 06:55:04 PM EDT, Daniel Hiltgen ***@***.***> wrote: @SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log? @harrytong can you share the ollama ps output from your system on the older version that worked, along with nvidia-smi output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>
Author
Owner

@harrytong commented on GitHub (Jul 25, 2024):

Hi Daniel,

Here are my nvidia-smi, ollama ps and server.log when I try to run following model and get the error.

@.***:~# ollama run deepseek-v2:236b
Error: llama runner process has terminated: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED  current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826  cublasCreate_v2(&cublas_handles[device])GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:101: !"CUDA

Thanks,
Harry

<!-- gh-comment-id:2249679652 --> @harrytong commented on GitHub (Jul 25, 2024): Hi Daniel, Here are my nvidia-smi, ollama ps and server.log when I try to run following model and get the error. ***@***.***:~# ollama run deepseek-v2:236b Error: llama runner process has terminated: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED  current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826  cublasCreate_v2(&cublas_handles[device])GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:101: !"CUDA Thanks, Harry
Author
Owner

@harrytong commented on GitHub (Jul 25, 2024):

I am also uploading the files here.
ollama.ps.txt
ollama.list.txt
ollama.server.log
nvidia-smi.txt

<!-- gh-comment-id:2249685226 --> @harrytong commented on GitHub (Jul 25, 2024): I am also uploading the files here. [ollama.ps.txt](https://github.com/user-attachments/files/16372481/ollama.ps.txt) [ollama.list.txt](https://github.com/user-attachments/files/16372482/ollama.list.txt) [ollama.server.log](https://github.com/user-attachments/files/16372483/ollama.server.log) [nvidia-smi.txt](https://github.com/user-attachments/files/16372484/nvidia-smi.txt)
Author
Owner

@Howe829 commented on GitHub (Sep 13, 2024):

Upgrade it to the latest version solved it for me.

<!-- gh-comment-id:2349219038 --> @Howe829 commented on GitHub (Sep 13, 2024): Upgrade it to the latest version solved it for me.
Author
Owner

@harrytong commented on GitHub (Sep 17, 2024):

Me too. Thx.

<!-- gh-comment-id:2356174672 --> @harrytong commented on GitHub (Sep 17, 2024): Me too. Thx.
Author
Owner

@dhiltgen commented on GitHub (Sep 17, 2024):

Great to hear the latest version is working now.

@SunMacArenas if you're still having trouble after upgrading, please share your server log and I'll reopen the issue.

<!-- gh-comment-id:2356283511 --> @dhiltgen commented on GitHub (Sep 17, 2024): Great to hear the latest version is working now. @SunMacArenas if you're still having trouble after upgrading, please share your server log and I'll reopen the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50018