[GH-ISSUE #5295] "CUDA error" #49831

Closed
opened 2026-04-28 13:06:48 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @mayukhpv1997 on GitHub (Jun 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5295

What is the issue?

when running "ollama run llama3" below error shown.

Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_EXECUTION_FAILED
current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:1855
cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu💯 !"CUDA error"

CUDA Version :12.2

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.1.45

Originally created by @mayukhpv1997 on GitHub (Jun 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5295 ### What is the issue? when running "ollama run llama3" below error shown. > Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_EXECUTION_FAILED current device: 0, in function ggml_cuda_mul_mat_batched_cublas at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:1855 cublasGemmBatchedEx(ctx.cublas_handle(), CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, alpha, (const void **) (ptrs_src.get() + 0*ne23), CUDA_R_16F, nb01/nb00, (const void **) (ptrs_src.get() + 1*ne23), CUDA_R_16F, nb11/nb10, beta, ( void **) (ptrs_dst.get() + 0*ne23), cu_data_type, ne01, ne23, cu_compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:100: !"CUDA error" CUDA Version :12.2 ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.1.45
GiteaMirror added the bug label 2026-04-28 13:06:48 -05:00
Author
Owner

@d-kleine commented on GitHub (Jun 26, 2024):

How much vRAM do you have on your GPU?
#4912

<!-- gh-comment-id:2192177759 --> @d-kleine commented on GitHub (Jun 26, 2024): How much vRAM do you have on your GPU? #4912
Author
Owner

@mayukhpv1997 commented on GitHub (Jun 27, 2024):

32 GB

<!-- gh-comment-id:2193968343 --> @mayukhpv1997 commented on GitHub (Jun 27, 2024): 32 GB
Author
Owner

@d-kleine commented on GitHub (Jun 27, 2024):

vRAM, not RAM. Which GPU do you have?

<!-- gh-comment-id:2193973030 --> @d-kleine commented on GitHub (Jun 27, 2024): vRAM, not RAM. Which GPU do you have?
Author
Owner

@mayukhpv1997 commented on GitHub (Jun 28, 2024):

NVIDIA Jetson AGX Orin
36-28-2024 09:53:58 RAM 2081/305/788 (LFb 6415x483) SKAP 8/15286MB (cached MB) CP

[6k0729,0x8729,4%8279,250729,258729,1%2720,729,0x8729,off, off off off] EMC FR EQ 180665 GR3D FREQ @%@[0,0] VIC FRFO 729 APE 174 CVO 256C CPU@38.6750 rboard 290 SOC2035./SC Tolode@28.SC SOC0036.056C CV10-2560 GPU-2560 1838.625C SOC1835.9680 CV20-256C VDD CPU SOC 2200ml/2280MM VDD CPU CV 744MW/244ml VIN SYS SV6 3326mW/357 6M VODO VOD2 1V8A0 663MM/663mW,

<!-- gh-comment-id:2196201285 --> @mayukhpv1997 commented on GitHub (Jun 28, 2024): NVIDIA Jetson AGX Orin 36-28-2024 09:53:58 RAM 2081/305/788 (LFb 6415x483) SKAP 8/15286MB (cached MB) CP [6k0729,0x8729,4%8279,250729,258729,1%2720,729,0x8729,off, off off off] EMC FR EQ 180665 GR3D FREQ @%@[0,0] VIC FRFO 729 APE 174 CVO 256C CPU@38.6750 rboard 290 SOC2035./SC Tolode@28.SC SOC0036.056C CV10-2560 GPU-2560 1838.625C SOC1835.9680 CV20-256C VDD CPU SOC 2200ml/2280MM VDD CPU CV 744MW/244ml VIN SYS SV6 3326mW/357 6M VODO VOD2 1V8A0 663MM/663mW,
Author
Owner

@d-kleine commented on GitHub (Jun 29, 2024):

Try the above linked issue. Some user have deleted their model and update ollama, maybe this resolves your issue.

<!-- gh-comment-id:2198256602 --> @d-kleine commented on GitHub (Jun 29, 2024): Try the above linked issue. Some user have deleted their model and update ollama, maybe this resolves your issue.
Author
Owner

@mayukhpv1997 commented on GitHub (Jul 2, 2024):

Issue resolved.
Solution: need to run ollama server in background before "ollama run llama3"
command: ollama serve

<!-- gh-comment-id:2203091503 --> @mayukhpv1997 commented on GitHub (Jul 2, 2024): Issue resolved. Solution: need to run ollama server in background before "ollama run llama3" command: ollama serve
Author
Owner

@dhiltgen commented on GitHub (Jul 2, 2024):

You didn't mention which JetPack you're running. We have 2 open issues tracking mismatched bundled CUDA versions #2408 and #4693.

<!-- gh-comment-id:2204462905 --> @dhiltgen commented on GitHub (Jul 2, 2024): You didn't mention which JetPack you're running. We have 2 open issues tracking mismatched bundled CUDA versions #2408 and #4693.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49831