[GH-ISSUE #1083] "initialization error" when using CUDA #47047

Closed
opened 2026-04-28 02:53:29 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Martin7-1 on GitHub (Nov 11, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1083

Hi! I'm using Ollama 0.1.9 in Ubuntu and Tesla V100 GPU, when I run ollama serve and do a POST request to embedding by CodeLlama:7b, I encounter error below:

2023/11/11 16:00:46 images.go:824: total blobs: 6
2023/11/11 16:00:46 images.go:831: total unused blobs removed: 0
2023/11/11 16:00:46 routes.go:696: Listening on 127.0.0.1:11434 (version 0.1.9)
2023/11/11 16:02:42 llama.go:290: 65020 MB VRAM available, loading up to 427 GPU layers
2023/11/11 16:02:42 llama.go:415: starting llama runner
2023/11/11 16:02:42 llama.go:473: waiting for llama runner to start responding

CUDA error 3 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: initialization error
current device: 0
2023/11/11 16:02:43 llama.go:430: 3 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: initialization error
current device: 0
2023/11/11 16:02:43 llama.go:438: error starting llama runner: llama runner process has terminated
2023/11/11 16:02:43 llama.go:504: llama runner stopped successfully
2023/11/11 16:02:43 llama.go:415: starting llama runner
2023/11/11 16:02:43 llama.go:473: waiting for llama runner to start responding
{"timestamp":1699689763,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}
{"timestamp":1699689763,"level":"INFO","function":"main","line":1323,"message":"build info","build":219,"commit":"9e70cc0"}
{"timestamp":1699689763,"level":"INFO","function":"main","line":1325,"message":"system info","n_threads":5,"n_threads_batch":-1,"total_threads":10,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256:3a43f93b78ec50f7c4e4dc8bd1cb3fff5a900e7d574c51a6f7495e48486e0dac (version GGUF V2 (latest))

I would be very grateful if anyone can help this.

Originally created by @Martin7-1 on GitHub (Nov 11, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1083 Hi! I'm using `Ollama 0.1.9` in `Ubuntu` and Tesla V100 GPU, when I run `ollama serve` and do a `POST` request to embedding by `CodeLlama:7b`, I encounter error below: ```bash 2023/11/11 16:00:46 images.go:824: total blobs: 6 2023/11/11 16:00:46 images.go:831: total unused blobs removed: 0 2023/11/11 16:00:46 routes.go:696: Listening on 127.0.0.1:11434 (version 0.1.9) 2023/11/11 16:02:42 llama.go:290: 65020 MB VRAM available, loading up to 427 GPU layers 2023/11/11 16:02:42 llama.go:415: starting llama runner 2023/11/11 16:02:42 llama.go:473: waiting for llama runner to start responding CUDA error 3 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: initialization error current device: 0 2023/11/11 16:02:43 llama.go:430: 3 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: initialization error current device: 0 2023/11/11 16:02:43 llama.go:438: error starting llama runner: llama runner process has terminated 2023/11/11 16:02:43 llama.go:504: llama runner stopped successfully 2023/11/11 16:02:43 llama.go:415: starting llama runner 2023/11/11 16:02:43 llama.go:473: waiting for llama runner to start responding {"timestamp":1699689763,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1} {"timestamp":1699689763,"level":"INFO","function":"main","line":1323,"message":"build info","build":219,"commit":"9e70cc0"} {"timestamp":1699689763,"level":"INFO","function":"main","line":1325,"message":"system info","n_threads":5,"n_threads_batch":-1,"total_threads":10,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256:3a43f93b78ec50f7c4e4dc8bd1cb3fff5a900e7d574c51a6f7495e48486e0dac (version GGUF V2 (latest)) ``` I would be very grateful if anyone can help this.
Author
Owner

@BruceMacD commented on GitHub (Nov 13, 2023):

Hi @Martin7-1,

Thanks for opening the issue. Would you be able to share the output of the nvidia-smi command so that I can narrow down the problem?

<!-- gh-comment-id:1809176181 --> @BruceMacD commented on GitHub (Nov 13, 2023): Hi @Martin7-1, Thanks for opening the issue. Would you be able to share the output of the `nvidia-smi` command so that I can narrow down the problem?
Author
Owner

@Martin7-1 commented on GitHub (Nov 14, 2023):

Hi! @BruceMacD,
I try to update CUDA version from 10.2 to 11.4 and it finally works, maybe this error happened because the CUDA version was too low.

<!-- gh-comment-id:1809639289 --> @Martin7-1 commented on GitHub (Nov 14, 2023): Hi! @BruceMacD, I try to update CUDA version from 10.2 to 11.4 and it finally works, maybe this error happened because the CUDA version was too low.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47047