[GH-ISSUE #5839] CUDA error: CUBLAS_STATUS_NOT_INITIALIZED #3640

Closed
opened 2026-04-12 14:25:08 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @CaptainDP on GitHub (Jul 22, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5839

What is the issue?

error msg:
CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826
cublasCreate_v2(&cublas_handles[device])

model:qwen2-sft,use llama.cpp/convert_hf_to_gguf.py transfer to gguf;
env1:ubuntu20+A800:CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
env2:MAC os:is ok

OS

Linux, Docker

GPU

Nvidia

CPU

Intel

Ollama version

ollama version is 0.2.7

Originally created by @CaptainDP on GitHub (Jul 22, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5839 ### What is the issue? error msg: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) model:qwen2-sft,use llama.cpp/convert_hf_to_gguf.py transfer to gguf; env1:ubuntu20+A800:CUDA error: CUBLAS_STATUS_NOT_INITIALIZED env2:MAC os:is ok ### OS Linux, Docker ### GPU Nvidia ### CPU Intel ### Ollama version ollama version is 0.2.7
GiteaMirror added the bug label 2026-04-12 14:25:08 -05:00
Author
Owner

@CaptainDP commented on GitHub (Jul 22, 2024):

full msg:
CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826
cublasCreate_v2(&cublas_handles[device])
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu💯 !"CUDA error"
time=2024-07-19T18:17:53.430+08:00 level=INFO source=server.go:612 msg="waiting for server to become available" status="llm server not responding"
time=2024-07-19T18:17:59.996+08:00 level=INFO source=server.go:612 msg="waiting for server to become available" status="llm server error"
time=2024-07-19T18:18:00.247+08:00 level=ERROR source=sched.go:443 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) CUDA error""
[GIN] 2024/07/19 - 18:18:00 | 500 | 17.219251827s | 127.0.0.1 | POST "/api/chat"
time=2024-07-19T18:18:05.644+08:00 level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.397515954 model=/root/.ollama/models/blobs/sha256-01bf5e0739037fb9f63fd3be28135437f89cc1ea27e263f1cd2308ef1c57dd38
time=2024-07-19T18:18:05.865+08:00 level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.618106055 model=/root/.ollama/models/blobs/sha256-01bf5e0739037fb9f63fd3be28135437f89cc1ea27e263f1cd2308ef1c57dd38
time=2024-07-19T18:18:06.115+08:00 level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.868642664 model=/root/.ollama/models/blobs/sha256-01bf5e0739037fb9f63fd3be28135437f89cc1ea27e263f1cd2308ef1c57dd38

<!-- gh-comment-id:2241963105 --> @CaptainDP commented on GitHub (Jul 22, 2024): full msg: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error" time=2024-07-19T18:17:53.430+08:00 level=INFO source=server.go:612 msg="waiting for server to become available" status="llm server not responding" time=2024-07-19T18:17:59.996+08:00 level=INFO source=server.go:612 msg="waiting for server to become available" status="llm server error" time=2024-07-19T18:18:00.247+08:00 level=ERROR source=sched.go:443 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) CUDA error\"" [GIN] 2024/07/19 - 18:18:00 | 500 | 17.219251827s | 127.0.0.1 | POST "/api/chat" time=2024-07-19T18:18:05.644+08:00 level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.397515954 model=/root/.ollama/models/blobs/sha256-01bf5e0739037fb9f63fd3be28135437f89cc1ea27e263f1cd2308ef1c57dd38 time=2024-07-19T18:18:05.865+08:00 level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.618106055 model=/root/.ollama/models/blobs/sha256-01bf5e0739037fb9f63fd3be28135437f89cc1ea27e263f1cd2308ef1c57dd38 time=2024-07-19T18:18:06.115+08:00 level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.868642664 model=/root/.ollama/models/blobs/sha256-01bf5e0739037fb9f63fd3be28135437f89cc1ea27e263f1cd2308ef1c57dd38
Author
Owner

@CaptainDP commented on GitHub (Jul 22, 2024):

change another GPU is ok

<!-- gh-comment-id:2242232217 --> @CaptainDP commented on GitHub (Jul 22, 2024): change another GPU is ok
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3640